Identifying Clinical Subgroups/Clusters of Alzheimer’s Patients from Optum’s De-Identified Market Clarity Database Using Machine Learning Techniques
Objective:
To identify and categorize Alzheimer’s disease (AD) patients into clinically relevant groups based on demographics, symptoms, and comorbidities.
Background:
Around 7 million people in the US are living with Alzheimer's Disease and it has high cost burden. Using machine learning, this study categorizes Alzheimer’s patients into clinical subgroups, aiding personalized treatment and early intervention to improve outcomes and reduce healthcare costs.
Design/Methods:
AD patients were identified using ICD-10 codes in the Optum® de-identified Market Clarity Dataset. Continuous eligible patients (3 years pre-index) above the age of 60 with at least 2 outpatient diagnosis (30 days apart) OR one inpatient diagnosis recorded between 1/1/2019 and 12/31/2020 were included in the analysis. For identifying subgroups, we used ML techniques like K-Means with MCA, Agglomerative Hierarchical, and DBSCAN.
Results:
Among 108,714 patients, mean age of 81 years and predominant female (61%) population was observed. 21 features were included in the study using K-Means, Hierarchical and DBSCAN algorithm and 5, 6, 4 clusters were observed respectively. K-means with MCA gave most consistent subgroups based on distance measures cosine and Eigen values.
Hypertension was identified as prominent risk factor and was present in all the clusters. Cluster 1 (Mild) had 49.5k patients with hypertension~85% and diabetes~34%. Cluster 2 (Severe) had 31.8k patients with hypertension~97%, diabetes~67%, heart failure~63%, coronary artery disease~78%, Kidney-disease~69%, atrial-fibliration~53%. Cluster 3 (Moderate) had 5k patients with hypertension~91%, diabetes~50%, fall~20%. Cluster 4 (Onset) had 6k patients with no significant comorbidities. Cluster 5 (Caution) had 16k patients with hypertension~ 92%, fall~85%, confusion~54%, depression~45%, memory loss~ 41%.
Further, we leveraged clinical notes to identify the impact of presence of APOE alleles in different clusters.
Conclusions:
This study can be leveraged for personalized and targeted healthcare intervention among different clusters. The study can also be used to determine if clusters have similar genetic makeup increasing their risk of developing AD.
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.