Prediction Model for Stroke in Ageing Population in India

PRAVIN SAHADEVAN, PhD Research Scholar
Vineet Kumar Kamal, All India Institute of Medical Sciences

Background: Stroke is the fourth leading cause of death in India. This study aims to develop and validate a prediction model to identify individuals at increased risk of stroke in the Indian population aged 45 years and older. Methods: Longitudinal Ageing Study in India (LASI) Wave 1 (2017-2018) data was used. The outcome variable was stroke (Yes/No). Models were developed using 70% of the data (training set) and evaluated on the remaining 30% (test set). Logistic regression (LR) was applied for stroke prediction, with internal validation through the Bootstrap technique and external validation in the test set. The study also compared the performance of machine learning (ML) algorithms—LR, Random Forest (RF), and Naïve Bayes. Results: LR identified significant stroke predictors such as female gender, diabetes, chronic heart disease, high cholesterol, family history of stroke, smoking, physical inactivity, and alcohol use. RF showed the highest accuracy and specificity, while LR had the highest sensitivity and an area under the curve compared to other algorithms in the test set. Conclusion: LR demonstrated good discrimination ability and was not very well calibrated in stroke prediction. RF performed better than other ML algorithms in terms of accuracy.

Keywords: Data and Methods, Population Ageing, Health and Morbidity

See extended abstract.