Large Language Model Pipeline Optimized for Neurology Education Demonstrates Improved Trainee Learning Outcomes in Randomized Intervention
Jay Khurana1, Hossam Zaki1, Saima Chaudhry1, Syed Rizvi1
1Neurology, Warren Alpert Medical School of Brown University
Objective:

To determine the effects of optimized artificial intelligence-generated study content on trainee comprehension of Neurology educational content.

Background:
Large Language Models (LLMs) have shown considerable promise in knowledge processing and synthesis across various medical disciplines. In medical education, most applications have focused on comparing LLM outputs to trainee performance or using LLMs for standardized assessment. This study is one of few in any discipline, including Neurology, that systematically evaluates the effect of standardized LLM-powered curricular interventions on medical learning.
Design/Methods:

We developed an end-to-end pipeline to convert weekly Neurology Grand Rounds lectures into relevant multiple-choice questions and summaries. From multiple distinct Grand Rounds lectures, an optimal prompt/large language model for generating either summaries or multiple choice questions was selected based on several standardized criteria. These criteria, including hallucination rate, learning objective coverage, and lecturer-determined question relevance, were assessed on over 2000 output annotations.

Using the optimized pipeline for weekly Grand Rounds, students on their third-year neurology clerkship, selected for similar baseline knowledge, answered AI-generated questions after being randomized to either receive AI-generated summaries or not receive them.

Results:
Trainees (n=30) randomized to receive AI summaries showed borderline improved exam performance (mean 8.32/10 vs 7.73/10, p=0.08). Of trainees who received summaries (n=17), 83% reported increased likelihood of reviewing Grand Rounds content, 78% reported decreased time required to understand content, and 61% reported better understanding.
Conclusions:
Our preliminary findings show that optimized AI-generated companion resources not only yield significant subjective improvements in trainee education, such as reduced time needed and increased likelihood to review content, but also achieve modest improvements in learning objective comprehension. This work highlights how optimized LLM pipelines can better capture learning content in an overwhelming learning environment and lays the groundwork for broader integration and evaluation across existing neurological educational frameworks.
10.1212/WNL.0000000000215453
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.