To predict the timing of cognitive decline in Alzheimer's disease (AD) using longitudinal electronic health record (EHR) data.
We linked EHR data (2011-2022) with an AD registry. We divided each patient’s records into sequential 3-month intervals from baseline (i.e., first record) to the end of follow-up or death. For each interval, we obtained from the EHR counts of AD-related features (e.g., donepezil prescriptions) selected from multi-source knowledge graphs, comorbidities and healthcare utilization metrics. Cognitive status (normal vs cognitive impairment) was determined using registry-derived clinical dementia rating (CDR) scores as well as Montreal Cognitive Assessment (MoCA) and Mini-Mental State Examination (MMSE) scores from registry or EHR, mapped to corresponding time intervals. We applied a novel semi-supervised machine learning approach (label-efficient incident phenotyping from longitudinal EHR [LATTE]) to predict cognitive status (normal vs cognitive impairment) across all remaining intervals, leveraging sparse gold-standard labels and EHR-derived surrogates. We assessed model performance using area under the receiver operating characteristic curve (AUROC). To assess clinical relevance, we estimated time to cognitive impairment (based on predicted cognitive status) stratified by APOE4 carrier status using Cox proportional hazard models.
Among 4199 AD patients (mean baseline age 72.5±10.1 years, 62% women, 90% non-Hispanic White), cognitive scores were available for CDR (n=1781), MoCA (n=2974) and MMSE (n=2333). LATTE achieved strong predictive performance across periods (AUROC: CDR=0.852; MoCA=0.922; MMSE=0.870). Of the 1717 patients with ApoE genotype, 45% were APOE4 carriers. APOE4 carriers had a significantly higher risk of cognitive decline than non-carriers (HR[95%CI]: CDR=1.578[1.398-1.781], MoCA=1.452[1.242-1.697], MMSE=1.918[1.638-2.247]; all p<.001).