Michael Chong, University of Toronto
Diego Alburez-Gutierrez, Max Planck Institute for Demographic Research
Monica Alexander, University of Toronto
Emilio Zagheni, Max Planck Institute for Demographic Research (MPIDR)
In recent years, online communities have created genealogical records that span multiple continents over several centuries and contain demographic information for millions of people. How- ever, these data are unrepresentative, and extracting accurate population information presents a major methodological challenge. We construct a Bayesian model that combines structured mortality estimation and smoothing techniques to correct mortality rates derived from FamiLinx, a large crowd-sourced genealogical dataset. Our model estimates and extrapolates a set of adjustment factors that capture the discrepancy between genealogy-derived rates and more reliable data from the Human Mortality Database. We apply our method to estimate 19th-century mortality rates for countries and time periods that are not covered by the high quality data to demonstrate out-of-sample performance. Our results illustrate a wide range of underreporting patterns across age, time, and between countries. In particular, we find that mortality is most severely underreported among young ages, ranging from a factor of 1/2 to under 1/10 the estimated true mortality rate. Understanding and accounting for these biases will be critical to future research using these data.
Keywords: Mortality and Longevity, Bayesian methods , Data and Methods, Historical Demography