Pietro Violo, Université de Montréal
Nadine Ouellette, Département de Démographie, Université de Montréal
This study builds upon our previous work showing that mortality data derived from obituaries accurately represent the general population, at least in Quebec, Canada (Violo and Ouellette, 2024). Using advanced machine learning methods, we aim to extract demographic variables, including marital status and imminent social network size, from a large dataset of 207,572 obituaries written in French. While Regular Expressions initially captured gender and age at death for 71% of cases, machine learning models significantly improved this to 100% by identifying patterns missed by traditional methods. Our preliminary findings reveal distinct differences in age at death across marital statuses, with single and divorced individuals showing a wider distribution, particularly among men. In contrast, cohabitating women tend to die younger, while married women uniquely show a lifespan disadvantage compared to men, potentially due to health conditions at a younger age. Through DBSCAN clustering, we explore the potential influence of social network size on longevity. This approach promises new insights into the social determinants of mortality, including the impact of family structure, marital status, and social isolation.
Keywords: Mortality and Longevity, Social network methods, Computational social science methods, Digital and computational demography