2023 American Academy of Neurology Abstract Website

Objective:

The GECO algorithm removes from datasets spurious correlations which are too complex for human observation or statistical analysis to detect. We demonstrated the method's efficacy in MRI images of the brain, leveraging generative techniques to maintain image quality while removing technical artifacts, and present GECO as a proof-of-concept for a more general approach to clearing complex, spurious correlations from many data types.

Background:

Machine learning models trained on imaging data have empirically shown an ability to detect complex and invisible artifacts with high accuracy, such as which type of machine a scan was taken from in the case of imaging. Such artifacts are potentially invisible to the human eye and statistical analysis, but can be identified by machine learning systems, leading them to focus on irrelevant features rather than scientifically and/or medically useful ones. Machine learning systems then often "shortcut” past the actual features researchers would like to detect and instead use unrelated, spurious correlations to make predictions.

Design/Methods:

GECO is Generative Adverserial Network designed for image-to-image translation, transforming an neuroimaging input image into a new image with user-selected spurious correlations removed.

Results:

Beginning with classifiers trained to identify images based on artifacts of interest in brain MRI images, GECO reduced the classifiers’ ability to detect these spurious correlations from 97% down to a difference which is nearly equal to a classifier making purely random guesses. We also observe over 98% structural similarity between the original and de-artifacted brain images, indicating the preservation of the vast majority of non-spurious information contained in the original images.

Conclusions:

In addition to solving the known problem of removing artifacts which hamper the analysis of brain MRI scans, the GECO algorithm opens the door to removing many other types of spurious correlations from both neuroimaging and a wide range of other data types in neurology and beyond.