2023 American Academy of Neurology Abstract Website

Anika Zahoor¹, Jayme Banks¹, Ryan Tesh¹, Christine Eckhardt¹, Haoqi Sun¹, Adam Greenblatt², Aline Herlopian ³, Ioannis Karakis⁴, Roohi Katyal⁵, Chetan Nayak², Marcus Ng⁶, Jonathan Williams², Irfan Sheikh¹, Fábio Nascimento², Michael Westover¹
¹Neurology, Massachusetts General Hospital, ²Neurology, Washington University in St. Louis, ³Neurology, Yale University, ⁴Neurology, Emory University, ⁵Neurology, Ochsner LSU Health Shreveport, ⁶Neurology, University of Manitoba

Objective:

To measure inter-rater reliability of experts in assessing encephalopathy severity using the VE-CAM-S grading system.

Background:

The VE-CAM-S (Visual EEG Confusion Assessment Method – Severity) scale quantifies encephalopathy severity based on EEG features. However, inter-rater reliability of experts using the scale has yet to be assessed.

Design/Methods:

We created an online test with thirty-two 15-second EEG samples. Each question asked users to indicate the presence/absence of each of 29 EEG features, 11 of which were used in the VE-CAM-S. Gold-standard was based on consensus of 3 authors (IS, FN, MBW). Ten experts from 6 institutions participated. We quantified performance by average spearman correlation of VE-CAM-S scores with the gold standard, and average sensitivity/specificity. We performed a qualitative analysis to identify errors in recognizing EEG features that most affected VE-CAM-S scores.

Results:

The average [95%CI] correlation between VE-CAM-S scores with the gold standard was 0.73 [0.59-0.86]. Specificity was very high (>90%) for all but generalized delta (77%). Sensitivity was high (>70%) for all but brief generalized attenuations (69%), generalized periodic discharges (67%), generalized theta (63%), BIRDS (57%), generalized alpha (57%), extreme delta brushes/EDB (50%), and generalized beta (50%). Probable reasons for errors were subtlety of some findings; confusing some findings (e.g., generalized beta vs. myogenic artifact, burst suppression vs. brief generalized attenuations); failure to correctly recognize BIRDs (mislabeled as focal IEDs) and EDB (mislabeled as GRDA). The largest errors occurred when experts missed or falsely identified features that carry higher weight in the VE-CAM-S scoring rubric.

Conclusions:

Expert agreement in VE-CAM-S scoring is high. Error analysis identified several ways to improve future versions, including breaking high-stakes features into smaller parts; creating a “cheat sheet” with scored examples to allow scorers to choose the closest match; and designing teaching materials to help scorers recognize subtle variations of high-stakes patterns.