An open-source NLP tool, MedCAT, was integrated into an automated pipeline, and used to extract structured, retrospective diagnostic data from semi-structured neurology clinic letters in Greater Manchester, United Kingdom, between 1 January 2018 and 1 November 2024.
Successfully extracted diagnostic data were coded and linked to sociodemographic data for 125,273 unique neurology outpatients from a large UK conurbation. Headache (16.1%, n=26631) and epilepsy (14.3%, n=24880) were the commonest diagnoses made. Females residing in areas of highest social deprivation (IMD1) were more likely to be diagnosed with functional neurological disorder (ASRR[95% CI]: 1.78[1.73-1.83]) and headache (ASRR 95% CI]: 1.64[1.61-1.68] or if male with epilepsy (ASRR[95% CI]: 1.36[1.32-1.39]). In demyelinating conditions, females from lower social deprivation (IMD5) were seen in neurology clinics more (ASRR[95% CI]: 1.34[1.23-1.45]). Ethnicity was not recorded for 16.5% (n=17523), but people of Asian, Black and Mixed ethnicities had a lower likelihood of a neurology clinic compared to White people.
Machine learning pipelines can be used to perform automated structural coding of outpatient data. For predominantly clinic-based specialities such as neurology this can be used to: identify diagnostic patient groups and explore potential health disparities. Such rich data, not currently available to decision makers, could support service planning, resource allocation and population health research.