Generative AI-assisted Screening Improves Efficiency in Neurology Systematic Reviews
Sai Krishna Vallamchetla1, Omar Abdelkader2, Md Manjurul Islam Shourav3, Michelle Lin1
1Neurology, Mayo Clinic, Florida, USA, 2Neurology, Westchester Medical Center, New York, USA, 3Neurology, LSU Health Shreveport, Louisiana, USA
Objective:
To evaluate the impact of generative artificial intelligence(AI) assistance on the speed and accuracy of title and abstract screening for systematic reviews in neurology.
Background:

Systematic reviews are critical for evidence-based neurology but are time-consuming and resource-intensive, with screening phases often creating significant bottlenecks. While large language models(LLMs) show potential to streamline this process, their real-world impact on reviewer performance in a supportive, human-in-the-loop role is understudied in neurology.

Design/Methods:

Four neurology trainees were grouped into two pairs based on previous screening experience. Pair A(A1, A2) consisted of less experienced trainees(1–2 SR), while Pair B(B1, B2) consisted of more experienced trainees(≥3 SRs). Within each pair, one reviewer was assigned to a traditional screening method(A2, B2), while the other was assigned to a generative AI-assisted method(A1, B1). The AI-assisted screening utilized PICOS(Population, Intervention/Exposure, Comparison, Outcome, Study design) summaries derived from titles and abstracts using an open-source LLM (Mistral-Nemo-Instruct-2407). All reviewers independently screened the same set of 1,003 articles against predefined criteria.  Screening times were recorded, and performance metrics were calculated. Post-screening surveys assessed usability, confidence, and perceived cognitive workload.

Results:
AI-assisted reviewers(A1:116 min; B1:90 min) screened four times faster than those without(A2:463 min; B2:370 min), reducing workload by ~75%. Sensitivity was perfect for AI-assisted reviewers(100%), whereas it was lower for those without assistance(88.0% and 92.0%). Furthermore, AI-assisted reviewers demonstrated higher accuracy(99.9%), specificity(99.9%), F1 scores(98.0%), and strong inter-rater reliability(Cohen's Kappa of 99.8%). Less experienced reviewer with AI-assistance(A1) outperformed experienced reviewer(B2) without assistance in both efficiency and sensitivity. All reviewers reported reduced cognitive load and improved decision confidence.
Conclusions:
Generative AI assistance substantially improves efficiency, accuracy, and user experience of systematic review screening in neurology. By enhancing rather than replacing human decision-making, this hybrid workflow offers a scalable approach to accelerate evidence synthesis and reduce reviewer fatigue. 
10.1212/WNL.0000000000216401
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.