Assessing Large Language Models as Multimodal Reasoning Engines for Parkinson's Disease Detection
Mallikarjun Verma1, Isheeta Gupta1, Aman Verma1
1Washington University in St. Louis
Objective:
To assess the capability of large language models (LLMs) to execute multimodal reasoning for the early detection of Parkinson’s disease (PD), utilizing acoustic and handwriting data via prompt-based inference.
Background:

Early PD detection remains challenging due to the subtle and overlapping nature of motor and nonmotor symptoms across speech and handwriting. Traditional machine learning models achieve strong accuracy on structured datasets but lack interpretability and cross-domain generalizability. Recent advances in general-purpose LLMs suggest emerging capacity for multimodal reasoning, offering potential for more transparent, adaptable approaches to early screening.

Design/Methods:

We developed a standardized benchmarking pipeline to assess several widely used LLMs (GPT, Claude, Gemini) on carefully curated datasets, which included voice features and handwriting features from both PD patients and control participants. Zero-shot and few-shot settings were employed to provide each model with structured case descriptions and quantitative metrics. Models were not fine-tuned; rather, inference was prompted with contextual information presenting a patient profile. Predefined scoring rubrics were used to validate the model outputs for predictive accuracy, internal consistency, and clinical reasoning fidelity.

Results:

All models showed some ability to tell the difference between PD and controls based on multiple types of cues. Even though the overall predictive accuracy was lower than that of specialized statistical models, the explanations that were made were logically consistent and easy to understand. The results show that there are biases that are specific to certain domains and that prompt design can have an effect on how people think about diagnoses.

Conclusions:

LLM-based prompt-driven inference offers a lightweight and privacy-preserving framework for initial PD screening when training specialized models is impractical. This study presents a systematic approach for assessing LLMs as generalized medical reasoning agents and highlights the potential of context-aware prompting for comprehensible, multimodal neurological evaluation.

10.1212/WNL.0000000000217874
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.