AI-assisted versus Human-generated Personal Statements for Neurology Residency Applications
Natalie Erickson1, Scott Millis1, Waleed Raheem Abood Abood1, Maysaa Basha2, Peter LeWitt2, Anza Memon3, Philip Ross4, Jacob Rube5, Carla Watson2, Deepti Zutshi6, Deepa Raghavan1, Rohit Marawar7
1Wayne State University, 2Wayne State University, Detroit Medical Center, 3John D. Dingell, VAMC, Detroit, Michigan, 4Wayne State University Physicians Group, Department of Neurology, 5University Health Center, 6Wayne State University School of Medicine, 7Wayne State University - Detroit Medical Center
Objective:
To evaluate whether neurology faculty can distinguish AI-assisted (AI) or human-generated(HG) personal statements(PS) in residency applications.
Background:
Large language models (LLMs) can help brainstorm, draft or edit PS for residency applications. No studies to date have examined the effect of LLM use on PS for neurology residency applications. Prior work in other specialties compared fully AI-generated versus HG PS, but most applicants are likely to LLMs as an assistive tool, which our study evaluates.
Design/Methods:
Eleven anonymized AI-PS were collected from medical student volunteers (three post-match, eight pre-match). Volunteers received a brief introduction to LLM basics and had two weeks to submit PS using ChatGPT 4o without restriction. A pool of 86 deidentified PS from neurology applicants selected for interview in the 2023 Neurology Match (predating widespread use of ChatGPT) provided the HG source; 11 HG-PS were selected for comparison. The 22 PS were mixed, randomized and independently rated by six blinded neurology faculty using sliding scales for various qualities. Two-sided t-tests compared AI and HG PS.
Results:
Faculty reviewers had a median of 3.5 years’ of experience on residency selection committees. There was no statistically significant differences between the AI-PS and HG-PS for readability (p < 0.119), originality (p < 0.072), authenticity (p < 0.341), overall quality (p < 0.695) or the “why neurology?” item (p < 0.730). Three reviewers suspected AI use in some PS and three were unsure. On average faculty estimated 2.5 PS as fully AI generated while 8.1 as AI-assisted, with an average confidence level of 41/100. Four of six reviewers were unaware of AAMC guidelines on AI use in residency PS.
Conclusions:
AI-assisted PS were indistinguishable from human-generated PS by blinded neurology faculty across multiple quality measures with low confidence in detecting AI use.
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.