Cross-country Experiments Summary

At a Glance

This page summarizes our experiments in finding the best approach for training cross-country poverty estimation models for the following DHS countries: Philippines (PH), Cambodia (KH), Timor (TL) Leste, and Myanmar (MM).

Our main goal for these experiments to validate our approach on running models for countries without DHS ground truth data.

We find that training a model with both features and wealth index labels scaled per-country give us the most promising results.

Table 1.1: Model performance by country split for each experiment, measured using Pearson Correlation Coefficient (\(r^2\))
\(r^2\)
Train Test Test single-country model raw features and
wealth index
recomputed wealth index scaled wealth index scaled features
and wealth index
KH, MM, TL PH 0.57 -0.19 -0.01 0.25 0.48
PH, MM, TL KH 0.7 0.62 -1.46 0.59 0.58
KH, PH, MM TL 0.6 0.37 -2.76 0.34 0.54
KH, PH, TL MM 0.5 0.46 0.01 0.45 0.51
Overal Average 0.59 0.32 -1.06 0.41 0.52

Experiments

We trained four main types of models:

  1. Cross-country regressor with raw wealth index
  2. Cross-country regressor with wealth index recalculated using the pooled data for the four countries
  3. Cross-country regressor with both wealth index and input features scaled per-country

Table 1.1 summarizes the model performance for each experiment. Each experiment was evaluated using a Leave One Country Out strategy, wherein the model was trained on three countries and tested on the fourth country. For example, we train on KH, MM, and TL, and tested on PH. We trained a model for every possible split (4 in total) and calculated the mean \(r^2\) across all splits. This strategy emulates the situation wherein we train the poverty estimation model on data with ground truth data and rollout on countries without.

Cross-country regressor with raw wealth index

📒 Notebook link

Learnings

  • Training the model on raw features and wealth index produces inconsistent results which are highly dependent on the left-out country.

Cross-country regressor with wealth index recalculated using the pooled data for the four countries

📒 Notebook link

Idea

Recomputing the wealth index from pooled country data from all countries will help the model predict better across countries

Experiment Performed

Combine raw country data and recompute the wealth index using PCA on a base set of features: drinking water, number of rooms, electrification, mobile telephone, radio, television, cars/trucks, refrigeration, motorcycle, floors, toilet

Learnings

  • Aligning features across different countries makes it very difficult to include all features
  • Recomputed index has poor correlation with original index
  • Index would likely improve if we recompute using a larger dataset

Cross-country regressor with both wealth index and input features scaled per-country

📒 Notebook link

Idea

Scaling the features and the DHS wealth index from absolute to relative values corrects country-level variations - Ex. 10 mbps download speed -> top 10% speed in that country

Experiment Performed

Trained a model using a) scaled features, b) scaled wealth index, and c) both, using different scaling methods

Learnings

  • Scaling gives us the best cross-country results and works best when both features and labels are scaled
  • Simpler method, but is scalable and already reported in other literature
  • Model only gives us purely relative predictions without an absolute value