Machine Learning Tools to Predict Subclinical Small-vessel Brain Lesions Using AHA Life’s Essential 8: Evidence from the ELSA-Brasil Longitudinal Study
Marianna Leite1, Carine Savalli2, Arão B. Oliveira3, Carlos Leandro4, Paulo A. Lotufo5, Isabela Benseñor5, Claudia C Leite4, Maria CG Otaduy4, Itamar de Souza Santos5, Adriana B Conforto4, Alessandra C Goulart6, Alexandre Chiavegatto7
1School of Public Health, Department of Epidemiology, University of São Paulo, São Paulo, Brazil. Santa Marcelina School of Medicine, São Paulo, Brazil, 2Federal University of São Paulo, Department of Public Policies and Collective Health, São Paulo, Brazil, 3Center for Clinical and Epidemiological Research, Hospital Universitario, University of Sao Paulo, Brazil. School of Public Health, Department of Epidemiology, University of São Paulo, São Paulo, Brazil, 4LIM-44, Instituto e Departamento de Radiologia e Oncologia, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, SP, Brazil, 5Center for Clinical and Epidemiological Research, Hospital Universitario, University of Sao Paulo, Brazil. School of Medicine, Universidade de São Paulo, São Paulo, Brazil, 6School of Public Health, Department of Epidemiology, University of São Paulo, São Paulo, Brazil. Center for Clinical and Epidemiological Research, Hospital Universitario, University of Sao Paulo, Brazil, 7School of Public Health, Department of Epidemiology, University of São Paulo, São Paulo, Brazil
Objective:
To compare ML algorithms for predicting CSVD lesions on 3T MRI from LE8 adherence levels and temporal changes in the ELSA-Brasil study, and to quantify feature contributions via explainable AI.
Background:
Subclinical cerebral small-vessel disease (CSVD), consisting of enlarged perivascular spaces (EPS), white-matter hyperintensities (WMH), lacunes (LAC), and microhemorrhages (MH) on brain MRI, is linked to cognitive decline, stroke, and mortality. Longitudinal cardiovascular health, as captured by the AHA Life’s Essential 8 (LE8), may be anticipated using combined modern machine learning (ML) methods.
Design/Methods:
ELSA-Brasil adult participants were followed over a 15-year period (n=233; mean age 66.1±9.2; 59% women). The outcomes were binary EPS, WMH, LAC, and MH. Predictors included categorical LE8 scores (1–3; higher=better), LE8 trajectory from Wave 1 (2008–2010) to Wave 3 (2017–2019), and baseline demographics (age, income, education, race, marital status). Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, and TabPFN were trained using a 70/30 train–test split, with repeated 5-fold cross-validation for hyperparameter tuning. Test AUC-ROC quantified performance; SHAP values assessed feature importance.
Results:
The prevalence of EPS was 64.8%, WMH 28.3%, LAC 11.6%, and MH 12.1%. For EPS, TabPFN performed best (AUC=0.764). Logistic Regression showed moderate discrimination for LAC and MH (AUC=0.749 and 0.739), while CatBoost was modest for WMH (AUC=0.692). SHAP consistently highlighted age, Wave-3 blood lipids, sleep improvement, and Wave-1 blood glucose among top contributors, alongside selected LE8 trajectory features.
Conclusions:
ML models leveraging LE8-based cardiovascular health meaningfully predict subclinical CSVD in ELSA-Brasil, with the strongest discrimination for EPS and reasonable performance for WMH. TabPFN emerges as a promising transformer-based alternative for classical machine learning algorithms. Larger, richly phenotyped cohorts and external validation are warranted to refine predictive performance and support translation into clinical and public health settings.
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.