Portfolio

Project Detail

Global Decarbonization Mobility Analytics

Built and scaled a global mobility measurement pipeline on over 120TB of geospatial ping data to estimate daily travel distances across more than 150 countries.

PythonPandasDaskGeoPandasNumPyAWS S3PyArrow/ParquetLarge-Scale Geospatial Analytics

Problem

The project needed globally comparable, policy-relevant mobility metrics from highly noisy device-level pings, while preserving consistency across countries with very different data coverage and quality.

Approach

I developed an end-to-end country/day processing workflow that ingested raw mobility pings from S3, mapped devices to urban areas, computed distance and speed features at segment level, and produced device-level daily distance metrics before urban-area aggregation. The pipeline applied explicit QA and harmonization rules, including daily coverage thresholds (e.g., 3-hour block filters), speed and distance outlier filtering, and traverser/non-unique movement handling. I then aggregated to urban-area/date/country indicators (mean/median distance and device counts), ran cross-country diagnostics and device-level descriptives, and implemented logistic-growth-based adjustment using observed coverage (5-minute segments) to correct for sparse sampling and improve comparability. I also integrated additional explanatory datasets into the pipeline, including OSM street-network features, GHSL building surface/volume indicators, GDP, population, land-use mix variables, and other datasets.

Results

Delivered reproducible mobility indicators and comparison tables/maps used for decarbonization analysis, with documented sensitivity checks (coverage, traverser removal, and filter configurations) to support robust interpretation across countries. Outputs also included a dashboard used by different country offices to support policy decisions, and a related paper that is about to be published.