Subgroup Disparities in Automated Census Record Linkage

Hannah Postel, Sanford School of Public Policy, Duke University
Ran Abramitzky, Stanford University
Leah Boustan, Princeton University
Katherine Eriksson, University of California, Davis

The automated linking of historical census records allows researchers to answer important socioeconomic questions about intergenerational mobility, immigrant assimilation, and more. We often assess automated linking methods by analyzing match rates and accuracy of the full census population. These measures vary substantially by population subgroup, such as for racial minorities and immigrant groups. This paper aims to understand why certain subgroups are more difficult to link than others and provides solutions to increase match rates and accuracy. Improving record linkage for immigrants and racial minority groups will both reduce prior bias in overall linked samples and enable study of group-specific experiences that were previously undetectable.

Keywords: Linked data sets , Census data, Historical Demography, Migrant Populations and Refugees

See extended abstract.