Model Rollouts Summary

At A Glance

This page summarizes our findings and learnings from developing a cross-country poverty estimation model and rolling it out for 9 countries in Southeast Asia

Model Rollout

The following figure describe our model rollout process for the 9 countries.

We were able to successfully train and use models to estimate wealth over the 9 countries using this approach. The output wealth estimations captures macro-level urbanization, predicting relatively higher wealth at major urban centers and lower wealth at remote and sparsely populated areas. These results qualitatively show good agreement by Meta’s RWI, showing that we can achieve good results in wealth estimation using only openly-available datasets and ground truth DHS data from 4 countries as our training data.

Snapshots of estimated relative wealth indexes for various areas in SEA

Resulting wealth estimates for the 9 SEA countries can be downloaded here.

Validating Results With Reference Data

We validated our model rollout results with the best available reference wealth data for 3 countries. We took the mean wealth predictions for both model predictions and the reference data per adminstrative boundary and ranked each area from highest to lowest wealth. From this, we created comparison maps and calculated the spearman rank correlation between the predicted and reference wealth rank.

Country	Reference Comparison Data	Granularity	# of Evaluation Areas	Spearman Rank Correlation
Indonesia	2018-2019 SUSENAS-derived Relative Wealth Index	Admin Level 2 (City/Regency)	513	0.72
Laos	2017 UNICEF MICS-derived International Wealth Index	Admin Level 1 (Province)	17	0.84
Malaysia	2016 Household Expenditure Survey (Mean)	Admin Level 1 (State / Federal Territory)	16	0.76

Indonesia Model Map Comparison with Susenas RWI (reference) (Rank Correlation: 0.72)

Laos Model Map Comparison with MICS IWI (Rank Correlation: 0.84)

Malaysia Model Map Comparison with Relative Wealth (reference) (Rank Correlation: 0.76)

Scaling

As previously discussed in our cross-country experiments summary, we found that scaling the input features and DHS Wealth Index from absolute to relative values corrects country-level variations. This approach leads to the best cross-country results.

During training, we found that StandardScaler produced the best metrics and chose it as our model. StandardScaler works by centering the data on the mean and using the standard deviation as the “unit”. Our model interprets the data as relative, meaning that if an area has above-average internet speeds or night-time lights, it must also have above-average wealth.

However, during our rollout, we noticed that many countries had above-zero values, which indicated that the model believed that all areas within the country had above-average wealth. This means that our initial approach resulted in overestimation of relative wealth during rollout.

We found that the root cause of this issue was that the rollout data had an abundance of remote and sparsely populated areas, with many zero or low values, which pulled down the mean. This meant that the above/below interpretation was significantly different between the training and rollout data.

Comparison between StandardScaler and MinMaxScaler. We found MinMaxScaler to perform best when rolling out to countries without training data.

To address this issue, we decided to switch from StandardScaler to MinMaxScaler. MinMaxScaler fixes the issue by anchoring the scaling on the minimum and maximum values seen in the rollout data, rather than on the mean and standard deviation. This means that the range is not affected by the many remote areas, and the scaling is much more accurate.

Optimizations for Processing Large Countries

Indonesia posed several challenges due to the scale of the country, which meant that much bigger datasets were required to rollout the model. Indonesia is 3x the size of the second largest country in our target countries (Myanmar), and so processing using the typical workflow used for the other 8 countries ran into memory issues Thus, it was necessary to optimize our existing workflows in order to process data for large countries, most especially batching the data processing in order to fit into memory

Quadkey tile indexes allowes us to query neighboring tiles together and intersect them with Ookla internet speed data without expensive geospatial operations. Quadkey-based optimizations are used in the background when use_aoi_quadkey=True in relevant functions.
Indonesia OpenStreetMap data is processed in batches based on major islands: Java, Kalimantan, Maluku, Sulawesi, Sumatra, Papua, Nusa-Tenggara

Links to Indonesia notebooks: