Reanalysis of Global Neurodegeneration Proteomics Consortium Data Uncovers Pervasive Risk of False Positives and False Negatives in Biomarker Discovery from Large, Multi-center Proteomic Studies
Evan Boyle1, Katrina Paumier1, Ludmila Voloboueva1, Ferhan Qureshi1, David Brazel1
1Octave Bioscience
Objective:
To develop high-plex proteomic data analysis methods for fluid biomarker discovery in the setting of neurodegeneration.
Background:
The Global Neurodegeneration Proteomics Consortium (GNPC) V1 harmonized dataset harbors measurements for 7,289 unique proteins from multiple Somalogic assays across >35,000 biofluid samples from Alzheimer’s disease, frontotemporal dementia, Amyotrophic Lateral Sclerosis, and Parkinson’s disease (PD) patients, plus healthy controls (HCs). Published case-control analyses per cohort and disease (Imam 2025) yielded median chi-squared test statistics (proteomic inflation factors) ranging from 1.01 to 26.29 (median: 1.47), suggesting statistical miscalibration.
Design/Methods:
We reanalyzed GNPC blood cohorts with >20 PD cases and >80 HCs. PD participants with coincident diseases were excluded. Protein markers below the limit of detection in >12.5% of samples or correlated with cell type composition were excluded. Protein abundances were standardized relative to HCs accounting for sex, age, and average protein abundance. We performed logistic regressions on case status against protein abundance, sex, years of education, and the first 5 principal component scores. Cohorts with hits were meta-analyzed via weighted Z-test.
Results:
The number of Bonferroni significant meta-analyzed markers decreased from 2,251 (Imam 2025) to 28. Proteomic inflation factors all fell below 1.93 (median: 1.13). The rate of significant biomarker sign discordance across different cohorts decreased from 38% to 1%. The ordering of candidate biomarkers ranked by significance changed dramatically (rho = 0.04); for example, integrin aVb5 (a heterodimer of frequently reported PD biomarkers ITGAV and ITGB5), did not possess even nominal significance before but ranked 16th in our meta-analysis.
Conclusions:
Choice of statistical approach profoundly impacts qualitative and quantitative results of biomarker association to neurodegeneration case-control statuses. Proteomic inflation factors may flag possible confounding in large, multicenter studies. Protein abundance standardization, removal of cases with coincident disease, and incorporating principal components as covariates can assist rigorous protein biomarker discovery in large-scale, multi-cohort proteomic datasets.
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.