Automated Detection of Cognitive-Linguistic Impairments from Connected Speech Using Conventional and LLM-Derived Linguistic Features
Sepideh Jamali Dogahe1, Joseph R. Duffy2, Leland R. Barnard2, John L. Stricker3, Rene L. Utianski2, Hugo Botha2
1Department of Ophthalmology, 2Department of Neurology, 3Department of Information Technology, Mayo Clinic
Objective:

To determine whether computational linguistic features derived from picture description tasks can automatically detect cognitive-linguistic impairments identified by speech-language pathologists.

Background:

Aphasia and related cognitive-linguistic impairments are important markers of neurological injury and disease, but automated measures for these remain limited. Advances in natural language processing (NLP), including measures derived from large language models (LLM), enable automated extraction of lexical and semantic features that may provide efficient and consistent markers of cognitive-linguistic impairment.

Design/Methods:

We analyzed 1,013 picture description recordings annotated by speech-language pathologists for five impairment types: grammatical errors, semantic errors, nonspecific terms, other cognitive-communication deficits, and word/phrase repetitions. After transcription with CrisperWhisper, we extracted conventional NLP metrics, such as lexical diversity (type–token ratio, unique word count), syntactic complexity (sentence length, part-of-speech distributions), and readability indices, as well as two LLM derived features: surprisal (Gemma 7B) and semantic deviation (all-mpnet-base-v2). Average word surprisal represents how unexpected each word is given its prior context. Semantic deviation is the cosine distance from each transcript's embedding to the average embeddings of healthy controls. To mitigate demographic confounding, one-to-two propensity score matching on age and gender yielded 306 participants (102 cases with at least one feature annotated as present 204 controls). Logistic regression models with leave-one-out cross-validation were used to predict each annotation.

Results:
Grammatical errors were detected with highest discrimination (AUC=0.93) using surprisal alone, while semantic errors (AUC=0.71) were best captured by surprisal and lexical diversity. Surprisal and semantic deviation were predictive of other cognitive-communication skills (AUC=0.75), whereas semantic deviation alone performed well for nonspecific terms (AUC=0.79). Type-token ratio showed moderate discrimination for word/phrase repetitions (AUC=0.71).
Conclusions:

Large language model-derived measures compliment traditional NLP features and can aid automatic detection of diverse cognitive-linguistic impairments. This approach highlights the potential of NLP-based pipelines to enhance clinical screening efficiency and consistency for cognitive-communication disorders.

10.1212/WNL.0000000000216168
Disclaimer: Abstracts were not reviewed by Neurology® and do not reflect the views of Neurology® editors or staff.