Jaeheon Jung, Korea University
Oh Seok Kim, Korea University
Stephen A. Matthews, Retired
Kee Whan Kim, Korea University
Mapping population distribution at a pixel level is essential for decision-making in fields like public health, regional development, and climate change adaptation. This study developed an ensemble machine learning (ML) based dasymetric mapping approach to disaggregate municipal-level population projections at a 500m pixel-level. We used population projection data from previous research and the latest spatial data, including population density, terrain, land cover, and nighttime lights. ML algorithms namely eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM) were employed, followed by permutation importance and SHapley Additive exPlanations (SHAP) analysis to evaluate variable importance and correlation. We separated the study area (i.e., South Korea) into urban and rural regions and fitted the ML models. Results showed XGBoost performed best in urban areas, while LightGBM excelled in rural areas, achieving the highest R-squared and lowest Root Mean Squared Error (RMSE). The ratio of residential areas was the most important variable in all models, with a strong positive correlation to population density. Comparing the data from 2022 and 2050 revealed all major cities in South Korea are projected to experience a population decline at the pixel level, except Sejong.
Keywords: Spatial Demography, Population projections, forecasts, and estimations, Geographic Information Systems (GIS), Data and Methods