%matplotlib inline
%reload_ext autoreload
%autoreload 2
Generate rollout grids
import sys
"../../../")
sys.path.append(import os
from pathlib import Path
import geopandas as gpd
import numpy as np
from povertymapping.hdx import get_hdx_file
from povertymapping.iso3 import get_region_name
from povertymapping.rollout_grids import (
compute_raster_stats,
get_region_filtered_bingtile_grids, )
/home/jace/workspace/unicef-ai4d-poverty-mapping/env/lib/python3.9/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.11.1-CAPI-1.17.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.1-CAPI-1.16.0). Conversions between both will be slow.
warnings.warn(
Generate Roll-out Grids: Indonesia
This notebook is the second step in the rollout and generates the rollout grid tiles over the country. The output file is used as an input for Step 3, where we will run the trained model over the set of grids.
The generated grids are set at 2.4km (zoom level 14), matching the grids used during model training. The grids are also filtered to only include populated areas based on Meta’s High Resolution Settlement Layer (HRSL) data.
Notes on Indonesia
The Indonesia grids are processed slightly differently to the other countries as the normal gridding workflow runs into memory issues when processing Indonesia. The size and scale of the country makes it difficult to load and process all the data in memory. For Indonesia, we follow this modifed workflow:
- We generate the grids over Indonesia first without filtering by population (
filter_population=False
) - We then group the generated grids by lower-zoom-level quadkeys so that we can have geographically close groupings
- The population count for each grid is generated on a per-group basis to limit the amount of data being loaded as it is being processed
- After getting the population count, we filter out the unpopulated grids manually and save to file
Set country-specific parameters
= "id"
COUNTRY_CODE = get_region_name(COUNTRY_CODE, code="alpha-2").lower()
REGION = "ADM2"
ADMIN_LVL = 14
ZOOM_LEVEL = 8 GROUP_ZOOM_LEVEL
Generate Grids
= get_region_filtered_bingtile_grids(
admin_grids_gdf
REGION,=ADMIN_LVL,
admin_lvl=ZOOM_LEVEL,
quadkey_lvl=True,
use_cache=False, # We set this to False for Indonesia
filter_population )
2023-03-15 01:22:53.354 | INFO | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:264 - No cached grids file found. Generating grids file :/home/jace/.cache/geowrangler/quadkey_grids/indonesia_14_ADM2_admin_grids.geojson
2023-03-15 01:22:53.362 | DEBUG | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:281 - Loading boundaries for region indonesia and admin level ADM2
2023-03-15 01:22:53.369 | INFO | povertymapping.geoboundaries:get_geoboundaries:41 - Downloading geoboundaries for IDN at admin level ADM2 at https://www.geoboundaries.org/gbRequest.html?ISO=IDN&ADM=ADM2
2023-03-15 01:22:53.362 | DEBUG | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:281 - Loading boundaries for region indonesia and admin level ADM2
2023-03-15 01:22:53.369 | INFO | povertymapping.geoboundaries:get_geoboundaries:41 - Downloading geoboundaries for IDN at admin level ADM2 at https://www.geoboundaries.org/gbRequest.html?ISO=IDN&ADM=ADM2
2023-03-15 01:22:55.255 | INFO | povertymapping.geoboundaries:get_geoboundaries:50 - Download path for IDN at admin level ADM2 found at https://geoboundaries.org/data/geoBoundaries-3_0_0/IDN/ADM2/geoBoundaries-3_0_0-IDN-ADM2.geojson
2023-03-15 01:41:14.769 | INFO | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:287 - Generating grids for region indonesia and admin level ADM2 at quadkey level 14
2023-03-15 01:46:44.171 | INFO | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:293 - Generated 340122 grids for region indonesia and admin level ADM2 at quadkey level 14
2023-03-15 01:46:44.182 | INFO | povertymapping.rollout_grids:get_region_filtered_bingtile_grids:299 - Assigning grids to admin areas using metric crs epsg:3857
/home/jace/workspace/unicef-ai4d-poverty-mapping/env/lib/python3.9/site-packages/geopandas/geodataframe.py:2196: UserWarning: `keep_geom_type=True` in overlay resulted in 2 dropped geometries of different geometry types than df1 has. Set `keep_geom_type=False` to retain all geometries
return geopandas.overlay(
/home/jace/workspace/unicef-ai4d-poverty-mapping/env/lib/python3.9/site-packages/geopandas/io/file.py:362: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
pd.Int64Index,
Explore Generated Grids
admin_grids_gdf.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 340122 entries, 0 to 340121
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 geometry 340122 non-null geometry
1 quadkey 340122 non-null object
2 shapeName 340122 non-null object
3 shapeISO 340122 non-null object
4 shapeID 340122 non-null object
5 shapeGroup 340122 non-null object
6 shapeType 340122 non-null object
dtypes: geometry(1), object(6)
memory usage: 20.8+ MB
2) admin_grids_gdf.head(
geometry | quadkey | shapeName | shapeISO | shapeID | shapeGroup | shapeType | |
---|---|---|---|---|---|---|---|
0 | POLYGON ((98.28369 -0.52734, 98.28369 -0.50536... | 31000101131223 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 |
1 | POLYGON ((98.28369 -0.54931, 98.28369 -0.52734... | 31000101133001 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 |
# Uncomment to view the generated grid on an interactive map
# Set a limit to plotted grids as loading all grids can become very slow
# n = 10000
# start_idx = 10000
# admin_grids_gdf.iloc[start_idx:start_idx+n].explore()
Compute Population Per Grid
For this calculation, we will calculate the population, batched based on groups calculated from the quadkey. By getting the first n digits of the quadkey, we are able to get the zoom level n quadkey to which that tile belongs to.
Ex. if we group based on zoom level 8 the quadkey 31000101131223
belongs to the grouping 31000101
.
This grouping ensures that the tile groupings are geographically close to one another, which reduces the raster window size that we needed to calculate the population count.
For more information about quadkeys, see Bing Maps Tile System.
Get groupings based on quadkey
= f"quadkey_level{GROUP_ZOOM_LEVEL}"
quadkey_group_col = admin_grids_gdf["quadkey"].str[:GROUP_ZOOM_LEVEL]
admin_grids_gdf[quadkey_group_col] 3) admin_grids_gdf.head(
geometry | quadkey | shapeName | shapeISO | shapeID | shapeGroup | shapeType | quadkey_level8 | |
---|---|---|---|---|---|---|---|---|
0 | POLYGON ((98.28369 -0.52734, 98.28369 -0.50536... | 31000101131223 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 |
1 | POLYGON ((98.28369 -0.54931, 98.28369 -0.52734... | 31000101133001 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 |
2 | POLYGON ((98.30566 -0.52734, 98.30566 -0.50536... | 31000101131232 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 |
# Demonstrate how the quadkey grouping gives us geographically close grids
= list(admin_grids_gdf[quadkey_group_col].unique())
quadkey_groups = 78
i = quadkey_groups[i]
test_group = admin_grids_gdf[admin_grids_gdf[quadkey_group_col] == test_group]
group_gdf group_gdf.explore()
Get population HDX file
This section will get the HDX filepath for the specifed REGION
. The file will also be downloaded if needed.
= get_hdx_file(REGION)
hdx_pop_file hdx_pop_file
2023-03-15 01:55:58.179 | INFO | povertymapping.hdx:get_hdx_file:200 - HDX Data: Unzipping the zip file /home/jace/.cache/geowrangler/hdx/idn_general_2020_geotiff.zip...
2023-03-15 01:59:09.933 | INFO | povertymapping.hdx:get_hdx_file:210 - HDX Data: Successfully downloaded and cached for indonesia at /home/jace/.cache/geowrangler/hdx/idn_general_2020_geotiff.zip!
Path('/home/jace/.cache/geowrangler/hdx/idn_general_2020.tif')
Compute population count, batched by the quadkey group
# Set aggregation settings
= dict(column="population", output="pop_count", func="sum")
aggregation = dict(nodata=np.nan) extra_args
# Compute population totals per grid
= compute_raster_stats(
admin_grids_pop_count
admin_grids_gdf,
hdx_pop_file,=aggregation,
aggregation=extra_args,
extra_args=quadkey_group_col,
group_col=None,
max_batch_size=None,
n_workers )
2023-03-15 01:59:10.349 | INFO | povertymapping.rollout_grids:compute_raster_stats:71 - Creating raster zonal stats for 340122 grids for file size 87714.317324 Mb, batched in 238 unique group/s from quadkey_level8
2023-03-15 01:59:10.350 | WARNING | povertymapping.rollout_grids:compute_raster_stats:74 - When batching by group, output gdf rows will be ordered based on the group.
100%|██████████| 238/238 [28:43<00:00, 7.24s/it]
2023-03-15 02:27:54.170 | INFO | povertymapping.rollout_grids:compute_raster_stats:89 - Completed raster zonal stats for 238 groups
2023-03-15 02:27:54.296 | INFO | povertymapping.rollout_grids:compute_raster_stats:91 - Concatenated raster zonal stats for 238 groups
admin_grids_pop_count.head()
geometry | quadkey | shapeName | shapeISO | shapeID | shapeGroup | shapeType | quadkey_level8 | pop_count | |
---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((98.28369 -0.52734, 98.28369 -0.50536... | 31000101131223 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 | NaN |
1 | POLYGON ((98.28369 -0.54931, 98.28369 -0.52734... | 31000101133001 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 | NaN |
2 | POLYGON ((98.30566 -0.52734, 98.30566 -0.50536... | 31000101131232 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 | 185.68042 |
3 | POLYGON ((98.30566 -0.54931, 98.30566 -0.52734... | 31000101133010 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 | NaN |
4 | POLYGON ((98.32764 -0.50536, 98.32764 -0.48339... | 31000101131231 | Nias Selatan | None | IDN-ADM2-3_0_0-B371 | IDN | ADM2 | 31000101 | NaN |
Keep grids with total population > 0
= admin_grids_pop_count[admin_grids_pop_count["pop_count"] > 0]
admin_grids_filtered = admin_grids_filtered.reset_index(drop=True)
admin_grids_filtered admin_grids_filtered.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 161230 entries, 0 to 161229
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 geometry 161230 non-null geometry
1 quadkey 161230 non-null object
2 shapeName 161230 non-null object
3 shapeISO 161230 non-null object
4 shapeID 161230 non-null object
5 shapeGroup 161230 non-null object
6 shapeType 161230 non-null object
7 quadkey_level8 161230 non-null object
8 pop_count 161230 non-null float64
dtypes: float64(1), geometry(1), object(7)
memory usage: 11.1+ MB
# # Uncomment to view interactive map
# # Set a limit to plotted grids as loading all grids can become very slow
# n = 10000
# start_idx = 100000
# admin_grids_filtered.iloc[start_idx:start_idx+n].explore()
Save to file
= "-".join(os.getcwd().split("/")[-2].split("-")[:3])
rollout_date = f"./{rollout_date}-{COUNTRY_CODE}-rollout-grids.geojson"
grid_save_path ="GeoJSON", index=False) admin_grids_filtered.to_file(grid_save_path, driver