Batman & Robin vs. The Riddler: Is ChatGPT a reliable sidekick to Neurologists for diagnosis and management of functional movement disorders
Pushpraj Poonia1, Ishreen Ahuja2, Abhinav Singh1
1GMCH Chandigarh, 2Department of Medicine, GMCH, Chandigarh
Objective:

This study investigates the utility of ChatGPT in assisting clinical-decision-making for patients afflicted with Functional-Movement-Disorders (FMDs), characterised by diverse clinical presentations and a lack of clear diagnostic criteria.

Background:

Functional-Movement-Disorders represent a group of conditions marked by abnormal movements or postures not attributed to identifiable neurological or medical causes. The complexity of diagnosing and managing FMDs often necessitates a multidisciplinary approach and poses a challenge even to experienced clinicians. The study explores the feasibility of ChatGPT, an advanced AI-language model, as a supplementary tool for clinicians dealing with FMD cases.

Design/Methods:
30 patients diagnosed with FMDs were randomly selected from neurology clinic. Comprehensive clinical data, encompassing patient-history, physical-examinations, and diagnostic assessments, were presented to ChatGPT-V.3.5 and panel of 7 neurologists practicing independently, specialising in movement disorders. ChatGPT was tasked with providing diagnostic, treatment, and functional status recommendations for each patient. Expert neurologists evaluated ChatGPT's recommendations on a 0-10 rating scale(0 = complete disagreement and 10 = complete agreement) for each aspect. Inter-rater-agreement was assessed using the intraclass-correlation-coefficient(ICC).
Results:

Among the patients, 21 (70%) were diagnosed with FMDs, while 9 (30%) presented with alternative movement disorders. Expert assessments of ChatGPT's recommendations varied: diagnosis (median 2.5, IQR 1-7.2, ICC 0.91, 95% CI 0.68 to 1.0), treatment-recommendations (6.5, IQR=5-7.8, ICC=0.82, 95%CI=0.48-0.92), therapy-regimens (6.8, IQR=3-7.5, ICC=0.85, 95%CI=0.52-0.93), consideration of functional-status (5.5, IQR=1-6.8, ICC=0.78, 95%CI=0.42-0.91), and overall concordance with recommendations (4.5, IQR=2.5-6.7, ICC=0.76, 95%CI=0.38-0.89). Notably, no significant distinctions emerged in ratings between FMD and other movement disorder cases.

Conclusions:

ChatGPT exhibited limitations in the accurate classification of FMDs but demonstrated potential in recommending management planning, as assessed by experienced neurologists. While ChatGPT cannot supplant clinical expertise, it may serve as a valuable adjunctive tool within a human-in-the-loop clinical-decision-making framework, furnishing supplementary insights for the management of FMD patients. Subsequent research in AI-models for Clinical-decision-making is needed.

10.1212/WNL.0000000000206058