Generate rollout grids

%matplotlib inline
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append("../../../")
import os
from pathlib import Path

import geopandas as gpd
import numpy as np

from povertymapping.hdx import get_hdx_file
from povertymapping.iso3 import get_region_name
from povertymapping.rollout_grids import (
    compute_raster_stats,
    get_region_filtered_bingtile_grids,
)

/home/jace/workspace/unicef-ai4d-poverty-mapping/env/lib/python3.9/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.11.1-CAPI-1.17.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.1-CAPI-1.16.0). Conversions between both will be slow.
  warnings.warn(

Generate Roll-out Grids: Indonesia

This notebook is the second step in the rollout and generates the rollout grid tiles over the country. The output file is used as an input for Step 3, where we will run the trained model over the set of grids.

The generated grids are set at 2.4km (zoom level 14), matching the grids used during model training. The grids are also filtered to only include populated areas based on Meta’s High Resolution Settlement Layer (HRSL) data.

Notes on Indonesia

The Indonesia grids are processed slightly differently to the other countries as the normal gridding workflow runs into memory issues when processing Indonesia. The size and scale of the country makes it difficult to load and process all the data in memory. For Indonesia, we follow this modifed workflow:

We generate the grids over Indonesia first without filtering by population (filter_population=False)
We then group the generated grids by lower-zoom-level quadkeys so that we can have geographically close groupings
The population count for each grid is generated on a per-group basis to limit the amount of data being loaded as it is being processed
After getting the population count, we filter out the unpopulated grids manually and save to file

Set country-specific parameters

COUNTRY_CODE = "id"
REGION = get_region_name(COUNTRY_CODE, code="alpha-2").lower()
ADMIN_LVL = "ADM2"
ZOOM_LEVEL = 14
GROUP_ZOOM_LEVEL = 8

Generate Grids

admin_grids_gdf = get_region_filtered_bingtile_grids(
    REGION,
    admin_lvl=ADMIN_LVL,
    quadkey_lvl=ZOOM_LEVEL,
    use_cache=True,
    filter_population=False,  # We set this to False for Indonesia
)

2023-03-15 01:22:53.354 | INFO     | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:264 - No cached grids file found. Generating grids file :/home/jace/.cache/geowrangler/quadkey_grids/indonesia_14_ADM2_admin_grids.geojson
2023-03-15 01:22:53.362 | DEBUG    | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:281 - Loading boundaries for region indonesia and admin level ADM2
2023-03-15 01:22:53.369 | INFO     | povertymapping.geoboundaries:get_geoboundaries:41 - Downloading geoboundaries for IDN at admin level ADM2 at https://www.geoboundaries.org/gbRequest.html?ISO=IDN&ADM=ADM2
2023-03-15 01:22:53.362 | DEBUG    | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:281 - Loading boundaries for region indonesia and admin level ADM2
2023-03-15 01:22:53.369 | INFO     | povertymapping.geoboundaries:get_geoboundaries:41 - Downloading geoboundaries for IDN at admin level ADM2 at https://www.geoboundaries.org/gbRequest.html?ISO=IDN&ADM=ADM2
2023-03-15 01:22:55.255 | INFO     | povertymapping.geoboundaries:get_geoboundaries:50 - Download path for IDN at admin level ADM2 found at https://geoboundaries.org/data/geoBoundaries-3_0_0/IDN/ADM2/geoBoundaries-3_0_0-IDN-ADM2.geojson

2023-03-15 01:41:14.769 | INFO     | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:287 - Generating grids for region indonesia and admin level ADM2 at quadkey level 14

2023-03-15 01:46:44.171 | INFO     | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:293 - Generated 340122 grids for region indonesia and admin level ADM2 at quadkey level 14
2023-03-15 01:46:44.182 | INFO     | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:299 - Assigning grids to admin areas using metric crs epsg:3857
/home/jace/workspace/unicef-ai4d-poverty-mapping/env/lib/python3.9/site-packages/geopandas/geodataframe.py:2196: UserWarning: `keep_geom_type=True` in overlay resulted in 2 dropped geometries of different geometry types than df1 has. Set `keep_geom_type=False` to retain all geometries
  return geopandas.overlay(
/home/jace/workspace/unicef-ai4d-poverty-mapping/env/lib/python3.9/site-packages/geopandas/io/file.py:362: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  pd.Int64Index,

Explore Generated Grids

admin_grids_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 340122 entries, 0 to 340121
Data columns (total 7 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   geometry    340122 non-null  geometry
 1   quadkey     340122 non-null  object  
 2   shapeName   340122 non-null  object  
 3   shapeISO    340122 non-null  object  
 4   shapeID     340122 non-null  object  
 5   shapeGroup  340122 non-null  object  
 6   shapeType   340122 non-null  object  
dtypes: geometry(1), object(6)
memory usage: 20.8+ MB

admin_grids_gdf.head(2)

	geometry	quadkey	shapeName	shapeISO	shapeID	shapeGroup	shapeType
0	POLYGON ((98.28369 -0.52734, 98.28369 -0.50536...	31000101131223	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2
1	POLYGON ((98.28369 -0.54931, 98.28369 -0.52734...	31000101133001	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2

# Uncomment to view the generated grid on an interactive map
# Set a limit to plotted grids as loading all grids can become very slow
# n = 10000
# start_idx = 10000
# admin_grids_gdf.iloc[start_idx:start_idx+n].explore()

Compute Population Per Grid

For this calculation, we will calculate the population, batched based on groups calculated from the quadkey. By getting the first n digits of the quadkey, we are able to get the zoom level n quadkey to which that tile belongs to.

Ex. if we group based on zoom level 8 the quadkey 31000101131223 belongs to the grouping 31000101.

This grouping ensures that the tile groupings are geographically close to one another, which reduces the raster window size that we needed to calculate the population count.

For more information about quadkeys, see Bing Maps Tile System.

Get groupings based on quadkey

quadkey_group_col = f"quadkey_level{GROUP_ZOOM_LEVEL}"
admin_grids_gdf[quadkey_group_col] = admin_grids_gdf["quadkey"].str[:GROUP_ZOOM_LEVEL]
admin_grids_gdf.head(3)

	geometry	quadkey	shapeName	shapeISO	shapeID	shapeGroup	shapeType	quadkey_level8
0	POLYGON ((98.28369 -0.52734, 98.28369 -0.50536...	31000101131223	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101
1	POLYGON ((98.28369 -0.54931, 98.28369 -0.52734...	31000101133001	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101
2	POLYGON ((98.30566 -0.52734, 98.30566 -0.50536...	31000101131232	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101

# Demonstrate how the quadkey grouping gives us geographically close grids
quadkey_groups = list(admin_grids_gdf[quadkey_group_col].unique())
i = 78
test_group = quadkey_groups[i]
group_gdf = admin_grids_gdf[admin_grids_gdf[quadkey_group_col] == test_group]
group_gdf.explore()

Get population HDX file

This section will get the HDX filepath for the specifed REGION. The file will also be downloaded if needed.

hdx_pop_file = get_hdx_file(REGION)
hdx_pop_file

2023-03-15 01:55:58.179 | INFO     | povertymapping.hdx:get_hdx_file:200 - HDX Data: Unzipping the zip file /home/jace/.cache/geowrangler/hdx/idn_general_2020_geotiff.zip...
2023-03-15 01:59:09.933 | INFO     | povertymapping.hdx:get_hdx_file:210 - HDX Data: Successfully downloaded and cached for indonesia at /home/jace/.cache/geowrangler/hdx/idn_general_2020_geotiff.zip!

Path('/home/jace/.cache/geowrangler/hdx/idn_general_2020.tif')

Compute population count, batched by the quadkey group

# Set aggregation settings
aggregation = dict(column="population", output="pop_count", func="sum")
extra_args = dict(nodata=np.nan)

# Compute population totals per grid
admin_grids_pop_count = compute_raster_stats(
    admin_grids_gdf,
    hdx_pop_file,
    aggregation=aggregation,
    extra_args=extra_args,
    group_col=quadkey_group_col,
    max_batch_size=None,
    n_workers=None,
)

2023-03-15 01:59:10.349 | INFO     | povertymapping.rollout_grids:compute_raster_stats:71 - Creating raster zonal stats for 340122 grids for file size 87714.317324 Mb, batched in 238 unique group/s from quadkey_level8
2023-03-15 01:59:10.350 | WARNING  | povertymapping.rollout_grids:compute_raster_stats:74 - When batching by group, output gdf rows will be ordered based on the group.
100%|██████████| 238/238 [28:43<00:00,  7.24s/it]
2023-03-15 02:27:54.170 | INFO     | povertymapping.rollout_grids:compute_raster_stats:89 - Completed raster zonal stats for 238 groups
2023-03-15 02:27:54.296 | INFO     | povertymapping.rollout_grids:compute_raster_stats:91 - Concatenated raster zonal stats for 238 groups

admin_grids_pop_count.head()

	geometry	quadkey	shapeName	shapeISO	shapeID	shapeGroup	shapeType	quadkey_level8	pop_count
0	POLYGON ((98.28369 -0.52734, 98.28369 -0.50536...	31000101131223	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101	NaN
1	POLYGON ((98.28369 -0.54931, 98.28369 -0.52734...	31000101133001	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101	NaN
2	POLYGON ((98.30566 -0.52734, 98.30566 -0.50536...	31000101131232	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101	185.68042
3	POLYGON ((98.30566 -0.54931, 98.30566 -0.52734...	31000101133010	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101	NaN
4	POLYGON ((98.32764 -0.50536, 98.32764 -0.48339...	31000101131231	Nias Selatan	None	IDN-ADM2-3_0_0-B371	IDN	ADM2	31000101	NaN

Keep grids with total population > 0

admin_grids_filtered = admin_grids_pop_count[admin_grids_pop_count["pop_count"] > 0]
admin_grids_filtered = admin_grids_filtered.reset_index(drop=True)
admin_grids_filtered.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 161230 entries, 0 to 161229
Data columns (total 9 columns):
 #   Column          Non-Null Count   Dtype   
---  ------          --------------   -----   
 0   geometry        161230 non-null  geometry
 1   quadkey         161230 non-null  object  
 2   shapeName       161230 non-null  object  
 3   shapeISO        161230 non-null  object  
 4   shapeID         161230 non-null  object  
 5   shapeGroup      161230 non-null  object  
 6   shapeType       161230 non-null  object  
 7   quadkey_level8  161230 non-null  object  
 8   pop_count       161230 non-null  float64 
dtypes: float64(1), geometry(1), object(7)
memory usage: 11.1+ MB

# # Uncomment to view interactive map
# # Set a limit to plotted grids as loading all grids can become very slow
# n = 10000
# start_idx = 100000
# admin_grids_filtered.iloc[start_idx:start_idx+n].explore()

Save to file

rollout_date = "-".join(os.getcwd().split("/")[-2].split("-")[:3])
grid_save_path = f"./{rollout_date}-{COUNTRY_CODE}-rollout-grids.geojson"
admin_grids_filtered.to_file(grid_save_path, driver="GeoJSON", index=False)