Establishing Reproducibility of Clinical Clusters Across Parkinson’s Disease Cohorts
Kristen Watkins1, Julia Greenberg1, Kelly Astudillo1, Charalambos Argyrou2, John Crary2, Steven Frucht1, Towfique Raj2, Giulietta Riboldi1
1Neurology, New York University Langone Health, 2Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai
Objective:
To establish reproducibility of phenotype-based clustering in two independent datasets of patients with Parkinson’s disease (PD).
Background:
PD is a heterogeneous disorder that is likely composed of several subgroups with distinct clinical features and patterns of disease progression. Cluster analysis is a valuable tool for characterizing phenotypic variability in clinical cohorts and for correlating phenotypes with biomarkers. However, data collection methods often differ between clinical and research settings, limiting the ability to obtain statistically significant results from smaller or less characterized cohorts and to compare results across studies. Establishing reproducibility of cluster analysis across sites would allow for greater generalizability. 
Design/Methods:

Non-hierarchical k-means clustering by phenotype of subjects in the large Parkinson’s Progressive Marker Initiative (PPMI) Validation Cohort (n=368) and a smaller, independent Discovery Cohort (n=179) was performed via Principal Component Analysis (cohort-based clusters). Eigenvectors of clustering of the Validation Cohort were used to re-cluster the Discovery Cohort (PPMI-based clusters). Overlap in cluster membership between cohort-based and PPMI-based clusters of the Discovery Cohort was assessed.

Results:
We identified two clusters in the Discovery Cohort and three clusters in the Validation Cohort. The first four principal components for clustering of the Validation Cohort, accounting for 43% of the variability, were driven by depression, anxiety, age at symptom onset, gender, and tremor-dominance. After re-clustering the Discovery Cohort based on these traits, 77% of subjects remained in their original cluster.  
Conclusions:
We successfully validated reproducibility of clustering in our Discovery Cohort. We propose a combination of non-hierarchical cluster analysis and cross-validation with re-clustering to establishing clustering reproducibility. This method can be adapted for use in a range of clinical scenarios to validate cohorts that are less extensively characterized or have low intrinsic power secondary to low sample size.
10.1212/WNL.0000000000204507