2026 American Academy of Neurology Abstract Website

Objective:

To evaluate whether a voice-interactive conversational agent powered by a large language model (LLM) can systematically elicit diagnostically relevant narratives from patients with suspected Alzheimer's disease and related dementias (ADRD) and informants, comparable to specialist-conducted interviews.

Background:

Early ADRD diagnosis requires comprehensive patient histories, yet time constraints often limit thorough symptom exploration, particularly outside specialty memory clinics with wait times exceeding one year. We developed a conversational agent to conduct structured interviews systematically covering approximately 70 symptoms across cognitive, behavioral, motor, and functional domains recommended by dementia specialists for assessment.

Design/Methods:

We conducted a within-subjects pilot study with 25 adults with suspected ADRD from a cognitive neurology clinic. Each participant completed both an agent-led interview and a separate clinician interview blinded to the agent interview. Both interviews were recorded and transcribed. Two dementia specialists pre-specified 32 high-yield symptoms from those elicited by the agent, further refined with clinic leadership to 21 core symptoms important for diagnosis or management. Symptoms were labeled in the agent and clinician transcripts by two annotators through consensus, with the clinician used as the benchmark. We evaluated symptom detection using sensitivity and specificity analyses, systematic coverage, ambiguity rates, and user experience surveys (n=19).

Results:

Participants: mean age 72±9.8 years; 67% female; 77% White, 20% Black, 3% Asian. The agent achieved 82% sensitivity (95% CI: 74-87%) and 91% specificity (95% CI: 84-95%) for symptom detection. Agent interviews showed more systematic coverage (median per-symptom not-discussed rate: 9.1% vs 22.7% for clinicians; p<0.01). Ambiguity rates were comparable (13.6% agent vs 17.6% clinician; p=0.064). User satisfaction was high; patients provided longer responses to the agent (39.5 vs 13.1 words/utterance).

Conclusions:

LLM-based conversational agents can systematically elicit complex ADRD symptoms with performance comparable to specialists, potentially serving as structured front-end tools to aid clinicians with diagnostic efficiency and increasing access.