%matplotlib inline
%reload_ext autoreload
%autoreload 2
Predict on rollout grids
import os
import sys
"../../../")
sys.path.append(
import getpass
import pickle
from pathlib import Path
import contextily as cx
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from povertymapping import nightlights, settings
from povertymapping.dhs import generate_dhs_cluster_level_data
from povertymapping.feature_engineering import (
categorize_wealth_index,
generate_features,
)from povertymapping.iso3 import get_region_name
from povertymapping.rollout_grids import get_region_filtered_bingtile_grids
Model Prediction on Rollout Grids: Timor Leste
This notebook runs the final model to create relative wealth estimations over populated areas within the given country. The model predictions will have a spatial resolution of 2.4km.
The model generates the predicted relative wealth
variable, which informs us estimates the wealth of a specific area. A value of 0 signifies that the area’s wealth is similar to the national average. In contrast, a positive or negative value suggests above or below-average national wealth, respectively.
The predicted relative wealth value is later binned into 5 wealth categories A-E by dividing the distribution into quintiles (every 20th percentile).
Set up Data Access
The following cell will prompt you to enter your EOG username and password. See this page to learn how to set-up your EOG account.
# Log-in using EOG credentials
= os.environ.get("EOG_USER", None)
username = username if username is not None else input("Username?")
username = os.environ.get("EOG_PASSWORD", None)
password = password if password is not None else getpass.getpass("Password?")
password
# set save_token to True so that access token gets stored in ~/.eog_creds/eog_access_token
= nightlights.get_eog_access_token(username, password, save_token=True) access_token
2023-04-14 15:40:26.084 | INFO | povertymapping.nightlights:get_eog_access_token:43 - Loaded access_token from /home/alron/.eog_creds/eog_access_token.txt
Set country-specific parameters
= "tl"
COUNTRY_CODE = "east-timor"
COUNTRY_OSM = 2019
OOKLA_YEAR = 2016
NIGHTLIGHTS_YEAR
= "-".join(os.getcwd().split("/")[-2].split("-")[:3])
rollout_date = Path(f"./{rollout_date}-{COUNTRY_CODE}-rollout-grids.geojson")
rollout_grids_path rollout_grids_path
Path('2023-02-21-tl-rollout-grids.geojson')
Set Model Parameters
# Model to use for prediction
= Path(f"./{rollout_date}-{COUNTRY_CODE}-single-country-model.pkl") MODEL_SAVE_PATH
Load Country Rollout AOI
The rollout area of interest is split into 2.4km grid tiles (zoom level 14), matching the areas used during model training. The grids are also filtered to only include populated areas based on Meta’s High Resolution Settlement Layer (HRSL) data.
Refer to the previous notebook 2_tl_generate_grids.ipynb
for documentation on generating this grid.
= gpd.read_file(rollout_grids_path)
aoi # aoi.explore() # Uncomment to view data in a map
Generate Features For Rollout AOI
%%time
= aoi.copy()
rollout_aoi
# Create features dataframe using generate_features module
= generate_features(
features
rollout_aoi,=COUNTRY_OSM,
country_osm=OOKLA_YEAR,
ookla_year=NIGHTLIGHTS_YEAR,
nightlights_year=False,
scale=True,
features_only )
2023-04-14 15:40:29.089 | INFO | povertymapping.osm:download_osm_country_data:199 - OSM Data: Cached data available for timor-leste at /home/alron/.geowrangler/osm/timor-leste? True
2023-04-14 15:40:29.091 | DEBUG | povertymapping.osm:load_pois:161 - OSM POIs for timor-leste being loaded from /home/alron/.geowrangler/osm/timor-leste/gis_osm_pois_free_1.shp
2023-04-14 15:40:30.405 | INFO | povertymapping.osm:download_osm_country_data:199 - OSM Data: Cached data available for timor-leste at /home/alron/.geowrangler/osm/timor-leste? True
2023-04-14 15:40:30.406 | DEBUG | povertymapping.osm:load_roads:180 - OSM Roads for timor-leste being loaded from /home/alron/.geowrangler/osm/timor-leste/gis_osm_roads_free_1.shp
2023-04-14 15:40:31.060 | DEBUG | povertymapping.ookla:load_type_year_data:79 - Contents of data cache: []
2023-04-14 15:40:31.061 | INFO | povertymapping.ookla:load_type_year_data:94 - Cached data available at /home/alron/.geowrangler/ookla/processed/520c5a19341a5d2e4490d24a947ca93f.csv? False
2023-04-14 15:40:31.062 | DEBUG | povertymapping.ookla:load_type_year_data:111 - No cached data found. Processing Ookla data from scratch.
2023-04-14 15:40:31.897 | INFO | povertymapping.ookla:download_ookla_year_data:224 - Ookla Data: Number of available files for fixed and 2019: 4
2023-04-14 15:40:31.909 | INFO | povertymapping.ookla:download_ookla_year_data:237 - Ookla Data: Cached data available for fixed and 2019 at /home/alron/.geowrangler/ookla/fixed/2019? True
2023-04-14 15:40:31.910 | DEBUG | povertymapping.ookla:load_type_year_data:146 - Generating quadkeys based on input aoi geometry to pull intersecting Ookla data.
2023-04-14 15:40:56.567 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, fixed 2019 1 being loaded from /home/alron/.geowrangler/ookla/fixed/2019/2019-01-01_performance_fixed_tiles.parquet
2023-04-14 15:41:01.122 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, fixed 2019 2 being loaded from /home/alron/.geowrangler/ookla/fixed/2019/2019-04-01_performance_fixed_tiles.parquet
2023-04-14 15:41:05.558 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, fixed 2019 3 being loaded from /home/alron/.geowrangler/ookla/fixed/2019/2019-07-01_performance_fixed_tiles.parquet
2023-04-14 15:41:10.454 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, fixed 2019 4 being loaded from /home/alron/.geowrangler/ookla/fixed/2019/2019-10-01_performance_fixed_tiles.parquet
2023-04-14 15:41:16.055 | DEBUG | povertymapping.ookla:load_type_year_data:184 - Concatenating quarterly Ookla data for fixed and 2019 into one dataframe
2023-04-14 15:41:16.226 | DEBUG | povertymapping.ookla:load_type_year_data:79 - Contents of data cache: ['520c5a19341a5d2e4490d24a947ca93f']
2023-04-14 15:41:16.227 | INFO | povertymapping.ookla:load_type_year_data:94 - Cached data available at /home/alron/.geowrangler/ookla/processed/5c8e9525a68935322736644fe7561f6a.csv? False
2023-04-14 15:41:16.227 | DEBUG | povertymapping.ookla:load_type_year_data:111 - No cached data found. Processing Ookla data from scratch.
2023-04-14 15:41:16.228 | INFO | povertymapping.ookla:download_ookla_year_data:224 - Ookla Data: Number of available files for mobile and 2019: 4
2023-04-14 15:41:16.241 | INFO | povertymapping.ookla:download_ookla_year_data:237 - Ookla Data: Cached data available for mobile and 2019 at /home/alron/.geowrangler/ookla/mobile/2019? True
2023-04-14 15:41:16.242 | DEBUG | povertymapping.ookla:load_type_year_data:146 - Generating quadkeys based on input aoi geometry to pull intersecting Ookla data.
2023-04-14 15:41:43.767 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, mobile 2019 1 being loaded from /home/alron/.geowrangler/ookla/mobile/2019/2019-01-01_performance_mobile_tiles.parquet
2023-04-14 15:41:46.515 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, mobile 2019 2 being loaded from /home/alron/.geowrangler/ookla/mobile/2019/2019-04-01_performance_mobile_tiles.parquet
2023-04-14 15:41:49.229 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, mobile 2019 3 being loaded from /home/alron/.geowrangler/ookla/mobile/2019/2019-07-01_performance_mobile_tiles.parquet
2023-04-14 15:41:52.655 | DEBUG | povertymapping.ookla:load_type_year_data:163 - Ookla data for aoi, mobile 2019 4 being loaded from /home/alron/.geowrangler/ookla/mobile/2019/2019-10-01_performance_mobile_tiles.parquet
2023-04-14 15:41:55.537 | DEBUG | povertymapping.ookla:load_type_year_data:184 - Concatenating quarterly Ookla data for mobile and 2019 into one dataframe
2023-04-14 15:41:55.749 | INFO | povertymapping.nightlights:get_clipped_raster:463 - Retrieving clipped raster file /home/alron/.geowrangler/nightlights/clip/5a2ab755246adb45217c58a60b82a309.tif
CPU times: user 1min 29s, sys: 8.25 s, total: 1min 37s
Wall time: 1min 34s
Inspect the generated features
features.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 2024 entries, 0 to 2023
Data columns (total 61 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 poi_count 2024 non-null float64
1 atm_count 2024 non-null float64
2 atm_nearest 2024 non-null float64
3 bank_count 2024 non-null float64
4 bank_nearest 2024 non-null float64
5 bus_station_count 2024 non-null float64
6 bus_station_nearest 2024 non-null float64
7 cafe_count 2024 non-null float64
8 cafe_nearest 2024 non-null float64
9 charging_station_count 2024 non-null float64
10 charging_station_nearest 2024 non-null float64
11 courthouse_count 2024 non-null float64
12 courthouse_nearest 2024 non-null float64
13 dentist_count 2024 non-null float64
14 dentist_nearest 2024 non-null float64
15 fast_food_count 2024 non-null float64
16 fast_food_nearest 2024 non-null float64
17 fire_station_count 2024 non-null float64
18 fire_station_nearest 2024 non-null float64
19 food_court_count 2024 non-null float64
20 food_court_nearest 2024 non-null float64
21 fuel_count 2024 non-null float64
22 fuel_nearest 2024 non-null float64
23 hospital_count 2024 non-null float64
24 hospital_nearest 2024 non-null float64
25 library_count 2024 non-null float64
26 library_nearest 2024 non-null float64
27 marketplace_count 2024 non-null float64
28 marketplace_nearest 2024 non-null float64
29 pharmacy_count 2024 non-null float64
30 pharmacy_nearest 2024 non-null float64
31 police_count 2024 non-null float64
32 police_nearest 2024 non-null float64
33 post_box_count 2024 non-null float64
34 post_box_nearest 2024 non-null float64
35 post_office_count 2024 non-null float64
36 post_office_nearest 2024 non-null float64
37 restaurant_count 2024 non-null float64
38 restaurant_nearest 2024 non-null float64
39 social_facility_count 2024 non-null float64
40 social_facility_nearest 2024 non-null float64
41 supermarket_count 2024 non-null float64
42 supermarket_nearest 2024 non-null float64
43 townhall_count 2024 non-null float64
44 townhall_nearest 2024 non-null float64
45 road_count 2024 non-null float64
46 fixed_2019_mean_avg_d_kbps_mean 2024 non-null float64
47 fixed_2019_mean_avg_u_kbps_mean 2024 non-null float64
48 fixed_2019_mean_avg_lat_ms_mean 2024 non-null float64
49 fixed_2019_mean_num_tests_mean 2024 non-null float64
50 fixed_2019_mean_num_devices_mean 2024 non-null float64
51 mobile_2019_mean_avg_d_kbps_mean 2024 non-null float64
52 mobile_2019_mean_avg_u_kbps_mean 2024 non-null float64
53 mobile_2019_mean_avg_lat_ms_mean 2024 non-null float64
54 mobile_2019_mean_num_tests_mean 2024 non-null float64
55 mobile_2019_mean_num_devices_mean 2024 non-null float64
56 avg_rad_min 2024 non-null float64
57 avg_rad_max 2024 non-null float64
58 avg_rad_mean 2024 non-null float64
59 avg_rad_std 2024 non-null float64
60 avg_rad_median 2024 non-null float64
dtypes: float64(61)
memory usage: 1.0 MB
Run Model on AOI
Load Model
with open(MODEL_SAVE_PATH, "rb") as f:
= pickle.load(f) model
Make Predictions
"Predicted Relative Wealth Index"] = model.predict(features.values) rollout_aoi[
Binning predictions into wealth categories
Afterwards, we label the predicted relative wealth by binning them into 5 categories: A
, B
, C
, D
, and E
where A
is the highest and E
is the lowest.
We can create these wealth categories by splitting the output Predicted Relative Wealth Index
distribution into 5 equally sized quintiles, i.e. every 20th percentile.
This categorization may be modified to suit the context of the target country.
# Simple quintile approach
"Predicted Wealth Category (quintile)"] = categorize_wealth_index(
rollout_aoi["Predicted Relative Wealth Index"], split_quantile=False
rollout_aoi[str) ).astype(
Save Output
%%time
rollout_aoi.to_file(f"{rollout_date}-{COUNTRY_CODE}-rollout-output.geojson",
="GeoJSON",
driver=False,
index )
CPU times: user 525 ms, sys: 10.5 ms, total: 536 ms
Wall time: 535 ms
# Join back raw features and save
= rollout_aoi.join(features)
rollout_output_with_features
rollout_output_with_features.to_file(f"{rollout_date}-{COUNTRY_CODE}-rollout-output-with-features.geojson",
="GeoJSON",
driver=False,
index )
Visualizations
Inspect predicted wealth index and output dataframe
"Predicted Relative Wealth Index"]].hist() rollout_aoi[[
array([[<AxesSubplot: title={'center': 'Predicted Relative Wealth Index'}>]],
dtype=object)
rollout_aoi.head()
quadkey | shapeName | shapeISO | shapeID | shapeGroup | shapeType | pop_count | geometry | Predicted Relative Wealth Index | Predicted Wealth Category (quintile) | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 31011220203121 | Nitibe | None | TLS-ADM2-3_0_0-B58 | TLS | ADM2 | 102.251936 | POLYGON ((124.03564 -9.34067, 124.03564 -9.318... | 0.219882 | D |
1 | 31011220203123 | Nitibe | None | TLS-ADM2-3_0_0-B58 | TLS | ADM2 | 992.492772 | POLYGON ((124.03564 -9.36235, 124.03564 -9.340... | 0.209892 | E |
2 | 31011220203130 | Nitibe | None | TLS-ADM2-3_0_0-B58 | TLS | ADM2 | 118.897600 | POLYGON ((124.05762 -9.34067, 124.05762 -9.318... | 0.247152 | C |
3 | 31011220203132 | Nitibe | None | TLS-ADM2-3_0_0-B58 | TLS | ADM2 | 513.637632 | POLYGON ((124.05762 -9.36235, 124.05762 -9.340... | 0.209138 | E |
4 | 31011220203310 | Nitibe | None | TLS-ADM2-3_0_0-B58 | TLS | ADM2 | 319.140940 | POLYGON ((124.05762 -9.38403, 124.05762 -9.362... | 0.242258 | D |
Create Static Maps
Plot Predicted Relative Wealth Index
plt.cla()
plt.clf()= rollout_aoi.to_crs("EPSG:3857")
rollout_aoi_plot = rollout_aoi_plot.plot(
ax "Predicted Relative Wealth Index",
=(20, 8),
figsize="viridis",
cmap=True,
legend={"shrink": 0.8},
legend_kwds
)=cx.providers.OpenStreetMap.Mapnik)
cx.add_basemap(ax, source
ax.set_axis_off()"Predicted Relative Wealth Index")
plt.title(
plt.tight_layout()f"{rollout_date}-{COUNTRY_CODE}-predicted-wealth-index.png")
plt.savefig( plt.show()
<Figure size 640x480 with 0 Axes>
plt.cla()
plt.clf()= rollout_aoi.to_crs("EPSG:3857")
rollout_aoi_plot = rollout_aoi_plot.plot(
ax "Predicted Wealth Category (quintile)",
=(20, 8),
figsize="viridis_r",
cmap=True,
legend
)=cx.providers.OpenStreetMap.Mapnik)
cx.add_basemap(ax, source
ax.set_axis_off()"Predicted Wealth Category")
plt.title(
plt.tight_layout()f"{rollout_date}-{COUNTRY_CODE}-predicted-wealth-bin.png")
plt.savefig( plt.show()
<Figure size 640x480 with 0 Axes>
Create an Interactive Map
= [
cols_of_interest "quadkey",
"shapeName",
"shapeGroup",
"pop_count",
"avg_rad_mean",
"mobile_2019_mean_avg_d_kbps_mean",
"fixed_2019_mean_avg_d_kbps_mean",
"poi_count",
"road_count",
"Predicted Relative Wealth Index",
"Predicted Wealth Category (quintile)",
]
# Warning: This can be a bit laggy due to the large amount of tiles being visualized
# Uncomment the ff if you want to viz the raw wealth predictions
# rollout_aoi.explore(column='Predicted Relative Wealth Index', tooltip=cols_of_interest, cmap="viridis")
# Uncomment the ff if you want to view the quintiles
# rollout_aoi.explore(column='Predicted Wealth Category (quintile)', tooltip=cols_of_interest, cmap="viridis_r")
Alternatively, you may also try to visualize this interactively in Kepler by uploading the rollout output geojson file.