The deep learning lesion segmentation method nicMSlesions only needs one manually delineated subject to outperform commonly used unsupervised methods
Merlin M Weeda1, Iman Brouwer1, Marlieke L de Vos1, Myrte S de Vries1, Frederik Barkhof1,2, Petra JW Pouwels1, and Hugo Vrenken1

1Department of Radiology and Nuclear Medicine, MS Center Amsterdam, Amsterdam Neuroscience, Amsterdam UMC - location VUmc, Amsterdam, Netherlands, 2Institutes of Neurology and Healthcare Engineering UCL, London, United Kingdom


Automatic lesion segmentation is important for measurements of atrophy and lesion load in subjects with multiple sclerosis (MS). Although supervised methods perform overall better than unsupervised methods, they are not widely used since they are more labor-intensive due to the need for great amounts of manual input. Our research showed increased performance of supervised methods over unsupervised methods. In addition, when using a deep learning based supervised method, training on only one subject already outperformed the commonly used unsupervised methods. We therefore recommend using deep learning lesion segmentation methods in MS research.


Multiple sclerosis (MS) is an autoimmune disorder of the central nervous system, characterized by neurodegeneration and demyelination. To enable both atrophy and lesion load measurements in subjects with MS, accurate lesion segmentation is necessary. Over the last decade, several (semi-)automatic lesion segmentation methods have been developed, which can be divided into two groups: supervised methods, which require manual delineation to train the method properly, and unsupervised methods, which do not require any training. Unsupervised methods are less labor-intensive, but show overall poor agreement with manual delineation.1


Therefore, the aim of this research was twofold: first, we investigated the volumetric and spatial agreement of two supervised and two unsupervised automated lesion segmentation methods. Second, we assessed whether input from only one subject’s manual delineation in the deep learning based supervised method already improved the volumetric and spatial agreement over unsupervised methods.


A total of fourteen subjects with RRMS were scanned between December 2016 and June 2017 on a 3T whole-body MR scanner (GE Discovery MR750) with an 8-channel phased-array head coil. The protocol included a 3D T1-weighted fast spoiled gradient echo sequence (FSPGR with TR/TE/TI = 8.2/3.2/450 ms and resolution 1.0x1.0x1.0 mm) and a 3D T2-weighted fluid attenuated inversion recovery sequence (FLAIR with TR/TE/TI = 8000/130/2338 ms at resolution 1.0x1.0x1.2 mm). An expert rater (experience >10 years) manually delineated the lesions on FLAIR images. Next, a total of four automated lesion segmentation methods were tested in comparison to manual segmentation, all based on different underlying algorithms. We tested two unsupervised methods, i.e. Lesion-Topology preserving Anatomical Segmentation (LesionTOADS)2 and Lesion Segmentation Toolbox with Lesion Prediction Algorithm (LST)3 and two supervised methods, i.e. FMRIB Software Library’s Brain Intensity AbNormality Classification Algorithm (FSL BIANCA)4 and Valverde’s nicMSlesions5. For the two supervised methods, we used input from all fourteen manually delineated subjects and used leave-one-out cross-validation.
For LST LPA, the probability threshold was set on 0.55 as reported previously1. BIANCA was optimized on our dataset (all and equal number of lesion and non-lesion points in the training set, any location of the non-lesion training points, a 3D patch with patch size 5, spatial weighting of 2, and threshold 0.99). For nicMSlesions, default parameters were used with threshold 0.5. No optimization was needed for LesionTOADS.
For our second aim, we further tested nicMSlesions with input from only one manually delineated subject for its performance on the other thirteen subjects. We looked at volumetric and spatial agreement of the various methods compared to manual, using repeated measures ANOVA with, when appropriate, post-hoc Wilcoxon Signed Ranks testing. Results were considered significant upon p < 0.05.


An example of the performance of the manual and automatic lesion segmentation methods is visualized in Figure 1 (note: here, the supervised methods are trained on fourteen subjects with leave-one-out cross-validation). The automated segmentation method used significantly affected both volumetric agreement (F(4,52) = 25.650, p < 0.001) and spatial agreement (F(3,39) = 27.954, p < 0.001) (Table 1). Post-hoc testing showed that manual volumes differed significantly from those of LST LPA, LesionTOADS and nicMSlesions, but not from those of BIANCA (Figure 2).
For the single-subject training of nicMSlesions, one subject failed the training and was therefore not included in any of the further analyses. The single-subject (1 to 13) that was used for training significantly affected the volumetric agreement (F(13,156) = 25.465, p < 0.001), but not the spatial agreement (F(12,144) = 1.497, p = 0.132) (Table 2). All single-subject trained nicMSlesions variants had greater spatial agreement with manual than the unsupervised lesion segmentation methods (Figure 3).

Discussion and conclusion

The two supervised methods showed better volumetric and spatial agreement to manual than the unsupervised methods, with BIANCA showing the best volumetric and nicMSlesions showing the best spatial agreement. Since the settings of nicMSlesions were set to default, it is possible that better volumetric agreement can be obtained upon optimization of the method.
Furthermore, our results show that manual lesion segmentation input from even one single subject is sufficient to train nicMSlesions with its default parameters in such a way that it outperforms the unsupervised methods LST LPA and LesionTOADS. Although training on multiple subjects shows even better volumetric and spatial agreement, studies without great amounts of manual delineations can use nicMSlesions with only one subject’s input and improve their automatic lesion segmentation over the commonly used unsupervised methods.
Results should be confirmed in multi-vendor images and in subjects with different MS phenotypes.


This work was supported by the Dutch MS Research Foundation (grant number 14-876).


  1. de Sitter A, Steenwijk MD, Ruet A, Versteeg A, Liu Y, van Schijndel RA, et al. Performance of five research-domain automated WM lesion segmentation methods in a multi-center MS study. Neuroimage. 2017;163:106-14.
  2. Shiee N, Bazin PL, Ozturk A, Reich DS, Calabresi PA, Pham DL. A topology-preserving approach to the segmentation of brain images with multiple sclerosis lesions. Neuroimage. 2010;49(2):1524-35.
  3. Schmidt P. Bayesian Inference for Structured Additive Regression Models for Large-scale Problems with Applications to Medical Imaging: LMU München; 2017.
  4. Griffanti L, Zamboni G, Khan A, Li L, Bonifacio G, Sundaresan V, et al. BIANCA (Brain Intensity AbNormality Classification Algorithm): A new tool for automated segmentation of white matter hyperintensities. Neuroimage. 2016;141:191-205.
  5. Valverde S, Cabezas M, Roura E, Gonzalez-Villa S, Pareto D, Vilanova JC, et al. Improving automated multiple sclerosis lesion segmentation with a cascaded 3D convolutional neural network approach. Neuroimage. 2017;155:159-68.


Figure 1. Overview of the different segmentation methods tested. Top row left to right: original 3D T2w FLAIR image; manual delineation (red); LST LPA (blue). Bottom row left to right: LesionTOADS (green); BIANCA (yellow); and nicMSlesions (pink). Arrows indicate examples of lesions not (completely) segmented by the automated methods.

Table 1. Overview of mean lesion volumes with their standard deviations (SD) from the different lesion segmentation methods and the similarity index compared to manual. Results shown are averages from all fourteen subjects in the dataset.

Figure 2. Scatter plots of the manual lesion volume (manual, y=x) versus the automated segmentation lesion volumes (LST LPA [blue], LesionTOADS [green], BIANCA [yellow], and nicMSlesions [pink]) showing significant differences between manual and automatic lesion segmentations, except between manual and BIANCA.

Table 2. Overview of mean lesion volumes with their standard deviations (SD) from the single-subject training of nicMSlesions and the similarity index compared to manual. Results shown are averages from the thirteen subjects in the dataset that had successful training (i.e. including the subject that was used for the single-subject training).

Figure 3. Box-and-whiskers plot (min-to-max, line at mean) of the similarity index (SI) between manual and automatic lesion segmentation (LST LPA [blue], LesionTOADS [green], BIANCA [yellow], nicMSlesions all [pink] and nicMSlesions variants 1 to 13 [grey]). Even trained on only one subject, nicMSlesions 1 to 13 show higher SIs than the unsupervised methods LST LPA and LesionTOADS.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)