%matplotlib inline
%reload_ext autoreload
%autoreload 2
Predict on rollout grids
import os
import sys
"../../../")
sys.path.append(
import getpass
import pickle
from pathlib import Path
import contextily as cx
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from povertymapping import nightlights, settings
from povertymapping.dhs import generate_dhs_cluster_level_data
from povertymapping.feature_engineering import (
categorize_wealth_index,
generate_features,
)from povertymapping.iso3 import get_region_name
from povertymapping.rollout_grids import get_region_filtered_bingtile_grids
Model Prediction on Rollout Grids: Philippines
This notebook is the final step in the rollout and runs the final model to create relative wealth estimations over populated areas within the given country. The model predictions will have a spatial resolution of 2.4km.
The predicted relative wealth
value gives us the relative wealth level of an area compared to the rest of the country, which fixes the value range from 0 (lowest wealth) to 1 (highest wealth). In between these extremes, each area’s wealth estimate is scaled to a value between 0 and 1.
The predicted relative wealth value is later binned into 5 wealth categories A-E by dividing the distribution into quintiles (every 20th percentile).
Set up Data Access
The following cell will prompt you to enter your EOG username and password. See this page to learn how to set-up your EOG account.
# Log-in using EOG credentials
= os.environ.get("EOG_USER", None)
username = username if username is not None else input("Username?")
username = os.environ.get("EOG_PASSWORD", None)
password = password if password is not None else getpass.getpass("Password?")
password
# set save_token to True so that access token gets stored in ~/.eog_creds/eog_access_token
= nightlights.get_eog_access_token(username, password, save_token=True) access_token
2023-04-14 15:10:22.477 | INFO | povertymapping.nightlights:get_eog_access_token:43 - Loaded access_token from /home/alron/.eog_creds/eog_access_token.txt
Set country-specific parameters
= "ph"
COUNTRY_CODE = get_region_name(COUNTRY_CODE, code="alpha-2").lower()
COUNTRY_OSM = 2019
OOKLA_YEAR = 2016
NIGHTLIGHTS_YEAR
= "-".join(os.getcwd().split("/")[-2].split("-")[:3])
rollout_date = Path(f"./{rollout_date}-{COUNTRY_CODE}-rollout-grids.geojson")
rollout_grids_path rollout_grids_path
Path('2023-02-21-ph-rollout-grids.geojson')
Set Model Parameters
# Model to use for prediction
= Path(f"./{rollout_date}-{COUNTRY_CODE}-single-country-model.pkl") MODEL_SAVE_PATH
Load Country Rollout AOI
The rollout area of interest is split into 2.4km grid tiles (zoom level 14), matching the areas used during model training. The grids are also filtered to only include populated areas based on Meta’s High Resolution Settlement Layer (HRSL) data.
Refer to the previous notebook 2_ph_generate_grids.ipynb
for documentation on generating this grid.
= gpd.read_file(rollout_grids_path)
aoi # aoi.explore() # Uncomment to view data in a map
Generate Features For Rollout AOI
%%time
= aoi.copy()
rollout_aoi
# Create features dataframe using generate_features module
= generate_features(
features
rollout_aoi,=COUNTRY_OSM,
country_osm=OOKLA_YEAR,
ookla_year=NIGHTLIGHTS_YEAR,
nightlights_year=False,
scale=True,
features_only )
2023-04-14 15:10:26.504 | INFO | povertymapping.osm:download_osm_country_data:199 - OSM Data: Cached data available for philippines at /home/alron/.geowrangler/osm/philippines? True
2023-04-14 15:10:26.505 | DEBUG | povertymapping.osm:load_pois:161 - OSM POIs for philippines being loaded from /home/alron/.geowrangler/osm/philippines/gis_osm_pois_free_1.shp
2023-04-14 15:10:39.945 | INFO | povertymapping.osm:download_osm_country_data:199 - OSM Data: Cached data available for philippines at /home/alron/.geowrangler/osm/philippines? True
2023-04-14 15:10:39.946 | DEBUG | povertymapping.osm:load_roads:180 - OSM Roads for philippines being loaded from /home/alron/.geowrangler/osm/philippines/gis_osm_roads_free_1.shp
2023-04-14 15:11:31.153 | DEBUG | povertymapping.ookla:load_type_year_data:79 - Contents of data cache: []
2023-04-14 15:11:31.154 | INFO | povertymapping.ookla:load_type_year_data:94 - Cached data available at /home/alron/.geowrangler/ookla/processed/c2f7493d8f417358a243d5a5d6534e91.csv? True
2023-04-14 15:11:31.155 | DEBUG | povertymapping.ookla:load_type_year_data:99 - Processed Ookla data for aoi, fixed 2019 (key: c2f7493d8f417358a243d5a5d6534e91) found in filesystem. Loading in cache.
2023-04-14 15:11:36.496 | DEBUG | povertymapping.ookla:load_type_year_data:79 - Contents of data cache: ['c2f7493d8f417358a243d5a5d6534e91']
2023-04-14 15:11:36.497 | INFO | povertymapping.ookla:load_type_year_data:94 - Cached data available at /home/alron/.geowrangler/ookla/processed/2fb42a1814adb4d0b74fd86a06791aab.csv? True
2023-04-14 15:11:36.498 | DEBUG | povertymapping.ookla:load_type_year_data:99 - Processed Ookla data for aoi, mobile 2019 (key: 2fb42a1814adb4d0b74fd86a06791aab) found in filesystem. Loading in cache.
2023-04-14 15:11:39.928 | INFO | povertymapping.nightlights:get_clipped_raster:463 - Retrieving clipped raster file /home/alron/.geowrangler/nightlights/clip/8a78adbc62c18180bdcb716a2ebfc3a3.tif
CPU times: user 3min 16s, sys: 7 s, total: 3min 23s
Wall time: 3min 23s
Inspect the generated features
features.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 46483 entries, 0 to 46482
Data columns (total 61 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 poi_count 46483 non-null float64
1 atm_count 46483 non-null float64
2 atm_nearest 46483 non-null float64
3 bank_count 46483 non-null float64
4 bank_nearest 46483 non-null float64
5 bus_station_count 46483 non-null float64
6 bus_station_nearest 46483 non-null float64
7 cafe_count 46483 non-null float64
8 cafe_nearest 46483 non-null float64
9 charging_station_count 46483 non-null float64
10 charging_station_nearest 46483 non-null float64
11 courthouse_count 46483 non-null float64
12 courthouse_nearest 46483 non-null float64
13 dentist_count 46483 non-null float64
14 dentist_nearest 46483 non-null float64
15 fast_food_count 46483 non-null float64
16 fast_food_nearest 46483 non-null float64
17 fire_station_count 46483 non-null float64
18 fire_station_nearest 46483 non-null float64
19 food_court_count 46483 non-null float64
20 food_court_nearest 46483 non-null float64
21 fuel_count 46483 non-null float64
22 fuel_nearest 46483 non-null float64
23 hospital_count 46483 non-null float64
24 hospital_nearest 46483 non-null float64
25 library_count 46483 non-null float64
26 library_nearest 46483 non-null float64
27 marketplace_count 46483 non-null float64
28 marketplace_nearest 46483 non-null float64
29 pharmacy_count 46483 non-null float64
30 pharmacy_nearest 46483 non-null float64
31 police_count 46483 non-null float64
32 police_nearest 46483 non-null float64
33 post_box_count 46483 non-null float64
34 post_box_nearest 46483 non-null float64
35 post_office_count 46483 non-null float64
36 post_office_nearest 46483 non-null float64
37 restaurant_count 46483 non-null float64
38 restaurant_nearest 46483 non-null float64
39 social_facility_count 46483 non-null float64
40 social_facility_nearest 46483 non-null float64
41 supermarket_count 46483 non-null float64
42 supermarket_nearest 46483 non-null float64
43 townhall_count 46483 non-null float64
44 townhall_nearest 46483 non-null float64
45 road_count 46483 non-null float64
46 fixed_2019_mean_avg_d_kbps_mean 46483 non-null float64
47 fixed_2019_mean_avg_u_kbps_mean 46483 non-null float64
48 fixed_2019_mean_avg_lat_ms_mean 46483 non-null float64
49 fixed_2019_mean_num_tests_mean 46483 non-null float64
50 fixed_2019_mean_num_devices_mean 46483 non-null float64
51 mobile_2019_mean_avg_d_kbps_mean 46483 non-null float64
52 mobile_2019_mean_avg_u_kbps_mean 46483 non-null float64
53 mobile_2019_mean_avg_lat_ms_mean 46483 non-null float64
54 mobile_2019_mean_num_tests_mean 46483 non-null float64
55 mobile_2019_mean_num_devices_mean 46483 non-null float64
56 avg_rad_min 46483 non-null float64
57 avg_rad_max 46483 non-null float64
58 avg_rad_mean 46483 non-null float64
59 avg_rad_std 46483 non-null float64
60 avg_rad_median 46483 non-null float64
dtypes: float64(61)
memory usage: 23.0 MB
Run Model on AOI
Load Model
with open(MODEL_SAVE_PATH, "rb") as f:
= pickle.load(f) model
Make Predictions
"Predicted Relative Wealth Index"] = model.predict(features.values) rollout_aoi[
Binning predictions into wealth categories
Afterwards, we label the predicted relative wealth by binning them into 5 categories: A
, B
, C
, D
, and E
where A
is the highest and E
is the lowest.
We can create these wealth categories by splitting the output Predicted Relative Wealth Index
distribution into 5 equally sized quintiles, i.e. every 20th percentile.
This categorization may be modified to suit the context of the target country.
# Simple quintile approach
"Predicted Wealth Category (quintile)"] = categorize_wealth_index(
rollout_aoi["Predicted Relative Wealth Index"], split_quantile=False
rollout_aoi[str) ).astype(
Format final Dataframe: Join features and predictions
Save Output
%%time
rollout_aoi.to_file(f"{rollout_date}-{COUNTRY_CODE}-rollout-output.geojson",
="GeoJSON",
driver=False,
index )
CPU times: user 10.5 s, sys: 210 ms, total: 10.7 s
Wall time: 10.7 s
# Join back raw features and save
= rollout_aoi.join(features)
rollout_output_with_features
rollout_output_with_features.to_file(f"{rollout_date}-{COUNTRY_CODE}-rollout-output-with-features.geojson",
="GeoJSON",
driver=False,
index )
Visualizations
Inspect predicted relative wealth index and output dataframe
"Predicted Relative Wealth Index"]].hist() rollout_aoi[[
array([[<AxesSubplot: title={'center': 'Predicted Relative Wealth Index'}>]],
dtype=object)
rollout_aoi.head()
quadkey | shapeName | shapeISO | shapeID | shapeGroup | shapeType | pop_count | geometry | Predicted Relative Wealth Index | Predicted Wealth Category (quintile) | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 13232122010020 | Tawi-Tawi | None | PHL-ADM2-3_0_0-B77 | PHL | ADM2 | 11.677980 | POLYGON ((118.47656 6.94824, 118.47656 6.97005... | 0.135210 | E |
1 | 13232120223323 | Tawi-Tawi | None | PHL-ADM2-3_0_0-B77 | PHL | ADM2 | 317.069037 | POLYGON ((118.41064 7.01367, 118.41064 7.03548... | 0.157153 | E |
2 | 13232122001101 | Tawi-Tawi | None | PHL-ADM2-3_0_0-B77 | PHL | ADM2 | 253.081800 | POLYGON ((118.41064 6.99186, 118.41064 7.01367... | 0.143458 | E |
3 | 13232122001103 | Tawi-Tawi | None | PHL-ADM2-3_0_0-B77 | PHL | ADM2 | 27.250587 | POLYGON ((118.41064 6.97005, 118.41064 6.99186... | 0.145540 | E |
4 | 13232120223332 | Tawi-Tawi | None | PHL-ADM2-3_0_0-B77 | PHL | ADM2 | 763.870501 | POLYGON ((118.43262 7.01367, 118.43262 7.03548... | 0.159144 | E |
Create Static Maps
Plot Predicted Relative Wealth Index
plt.cla()
plt.clf()= rollout_aoi.to_crs("EPSG:3857")
rollout_aoi_plot = rollout_aoi_plot.plot(
ax "Predicted Relative Wealth Index",
=(20, 8),
figsize="viridis",
cmap=True,
legend={"shrink": 0.8},
legend_kwds
)=cx.providers.OpenStreetMap.Mapnik)
cx.add_basemap(ax, source
ax.set_axis_off()"Predicted Relative Wealth Index")
plt.title(
plt.tight_layout()f"{rollout_date}-{COUNTRY_CODE}-predicted-wealth-index.png")
plt.savefig( plt.show()
<Figure size 640x480 with 0 Axes>
Plot Predicted Relative Wealth Index Category
plt.cla()
plt.clf()= rollout_aoi.to_crs("EPSG:3857")
rollout_aoi_plot = rollout_aoi_plot.plot(
ax "Predicted Wealth Category (quintile)",
=(20, 8),
figsize="viridis_r",
cmap=True,
legend
)=cx.providers.OpenStreetMap.Mapnik)
cx.add_basemap(ax, source
ax.set_axis_off()"Predicted Relative Wealth Quintile")
plt.title(
plt.tight_layout()f"{rollout_date}-{COUNTRY_CODE}-predicted-wealth-bin.png")
plt.savefig( plt.show()
<Figure size 640x480 with 0 Axes>
Create an Interactive Map
= [
cols_of_interest "quadkey",
"shapeName",
"shapeGroup",
"pop_count",
"avg_rad_mean",
"mobile_2019_mean_avg_d_kbps_mean",
"fixed_2019_mean_avg_d_kbps_mean",
"poi_count",
"road_count",
"Predicted Relative Wealth Index",
"Predicted Wealth Category (quintile)",
]
# Warning: This can be a bit laggy due to the large amount of tiles being visualized
# Uncomment the ff if you want to viz the raw wealth predictions
# rollout_aoi.explore(column='Predicted Relative Wealth Index', tooltip=cols_of_interest, cmap="viridis")
# Uncomment the ff if you want to view the quintiles
# rollout_aoi.explore(column='Predicted Wealth Category (quintile)', tooltip=cols_of_interest, cmap="viridis_r")
Alternatively, you may also try to visualize this interactively in Kepler by uploading the rollout output geojson file.