Machine Learning-optimized Feature Selection for Enhanced Accuracy in Early Prediction of Alzheimer's Disease and Mild Cognitive Impairment Using Markerless Gait Analysis: A Pilot Study
Rhea Doshi1, Ashkan Novin2, Roshni Patel3
1Kingswood Oxford, 2Cerenova, 3CENTEROF EXCELLENCE IN PAIN AND REGENERATIVE MEDICINE
Objective:
To optimize machine learning feature selection and refine ensemble architecture to increase prediction accuracy for early prediction of Alzheimer's disease and mild cognitive impairment using markerless gait analysis.
Background:
Early prediction of AD/MCI requires identifying the most informative gait biomarkers among high-dimensional data. Traditional approaches using all available features introduce noise and reduce model accuracy. There is a lack of systematic optimization of feature selection specifically to enhance prediction accuracy for early-stage cognitive impairment detection.
Design/Methods:
We enrolled 25 older adults (8 early-stage AD, 9 MCI, 8 cognitively normal; mean age 72.3±6.8 years; 56% female). Non-invasive markerless motion capture recorded gait patterns during standardized walking without requiring sensor attachment. From 87 spatiotemporal and kinematic measurements, we applied recursive feature elimination with cross-validation to systematically identify features maximizing predictive accuracy. We compared five ensemble architectures combining XGBoost, Random Forest, and Support Vector Machine algorithms, iteratively refining the architecture based on prediction performance. Architecture optimization was evaluated using 5-fold stratified cross-validation with performance metrics including accuracy, sensitivity, specificity, and area under the curve with 95% confidence intervals.
Results:
Recursive feature elimination identified 12 optimal biomarkers that maximized prediction accuracy, a reduction from 87 features, while improving model performance by 11.3% compared to the full feature set. Top predictive features: stride velocity coefficient of variation (importance 0.187, p<0.001), stride length asymmetry (0.154, p=0.002). The refined ensemble architecture achieved 88.0% accuracy detecting AD (AUC 0.93, sensitivity 87.5%, specificity 87.5%) and 84.0% accuracy detecting MCI (AUC 0.89, sensitivity 88.9%, specificity 75.0%). Feature-optimized model demonstrated superior early prediction capability, with stride velocity variability showing the strongest discriminative power (Cohen's d=1.82 for AD, d=1.34 for MCI).
Conclusions:
Systematic machine learning-driven feature selection optimization significantly enhanced prediction accuracy for early AD/MCI detection. Identifying 12 optimal biomarkers improved model performance, demonstrating strategic feature selection increases accuracy for early prediction, advancing clinically screening tools.
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.