Aim is to analyze available data of voice, Tappy keystroke, spiral drawings, and gait data involving PD patients and controls available in public databases using machine learning models and identify early PD characteristics.
PD affects approximately 6 million worldwide. Data analysis of voice, Tappy keystroke, spiral drawings, or gait using ML models may provide an inexpensive, non-invasive, and simple method for remote diagnosis of PD before the motor signs manifest.
An ML model was developed using Random Forest to analyze existing clinical data for PD and healthy controls. ML analysis was carried out on voice samples in PD and in REM sleep behavior disorder, Tappy keystroke, spiral drawings, and gait data sets from Kaggle database.
ML analysis of voice data revealed Accuracy 88.72 %, Precision 90.86 %, Recall 95.22 %, and F1 score 92.77%.
Tappy key stroke data revealed Accuracy 72.79 %, Precision 76.50 %, Recall 93.08 %, and F1 score 83.97 %.
Spiral drawing data revealed Accuracy 70.97 %, Precision 74.09 %, Recall 82.89 %, and F1 score 77.90 %.
Gait data revealed Accuracy 63.83 %, Precision 67.90 %, Recall 74.81 %, and F1 score 70.01 %.
Voice data in RBD revealed Accuracy 70.00%, Precision 72.00%, Recall 70.00%, and F1 score 69.00 %.
The ML prediction model developed may improve risk prediction in PD for early intervention and resource prioritization. An ML model based on the Random Forest algorithm was developed to analyze various PD characteristics before clinical diagnosis of PD. The current study suggests that voice analysis is the most robust test followed by Tappy keystroke, spiral drawings, and gait analysis. Voice is affected even in RBD patients revealing that voice is a sensitive and early measure of prodromal PD. Combining all four features, such as voice analysis, Tappy keystroke, spiral drawings, and gait analysis, may improve accuracy.