Automatic Detection and Segmentation of Brain Metastases using Deep Learning on Multi-Modal MRI: A Multi-Center Study
Endre Grøvik1,2, Darvin Yi3, Michael Iv2, Elizabeth Tong2, Kyrre Eeg Emblem1, Line Brennhaug Nilsen1, Cathrine Saxhaug4, Kari Dolven Jacobsen5, Åslaug Helland5, Daniel Rubin3, and Greg Zaharchuk2

1Department for Diagnostic Physics, Oslo University Hospital, Oslo, Norway, 2Department of Radiology, Stanford University, Stanford, CA, United States, 3Department of Biomedical Data Science, Stanford University, Stanford, CA, United States, 4Department of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway, 5Department of Oncology, Oslo University Hospital, Oslo, Norway


In recent years, many deep learning approaches have been developed and tested for automatic segmentation of gliomas. However, few studies have shown its potential for use in patients with brain metastases. Deep learning may ultimately aid radiologists in the tedious and time-consuming task of lesion segmentation. The objective of this work is to assess the clinical potential and generalizability of a deep learning technique, by training and testing a convolutional neural network for segmenting brain metastases using multi-center data.


Detection and segmentation of brain metastases on radiographic images sets the basis for clinical decision making and patient management. A precise segmentation is crucial for treatment decision, radiation planning and assessing treatment response, and must thus be performed with the utmost accuracy. Consequently, manual detection and segmentation is a tedious and time-consuming task for most radiologists, particularly with the growing use of multi-modal 3D imaging. By using a deep learning approach, the objective of this study was to train a fully convolution neural network (CNN) on multi-modal MRI data for automatic segmentation of brain metastases. The neural network was trained and tested on patients from two different hospitals (hereinafter referred to as ‘Hospital A’ and ‘Hospital B’, respectively), thus testing its robustness and generalizability to multi-center data, a key step towards understanding its clinical value.


This retrospective, multi-center study was approved by the appropriate Institutional Review Boards. The CNN was trained on 105 patients from Hospital A. The training set was split into 100/5 for training/validation. The training set consisted of pre-therapy MRI including pre- and post-gadolinium 3D T1-weighted fast spin echo (CUBE), and 3D CUBE fluid attenuated inversion recovery (FLAIR). The examinations were performed on a combination of 1.5T (n=8) and 3T (n=97) scanners. The ground truth was established by experienced neuroradiologists, manually delineating the enhancing lesions. Training was performed using a 2.5D fully convolution network based on a GoogLeNet architecture1 (Fig. 1) using 7 slices from the aforementioned sequences as input; a single center slice (ground truth) with 3 slices above and below. The network was modified to optimize segmentation and trained using the TensorFlow framework.

The network was tested on 15 patients from Hospital B, in which pre-therapy MRI included pre- and post-gadolinium 3D T1-weighted fast spin echo (SPACE) and 3D T2-weighted FLAIR. The resulting segmentations were evaluated by estimating the recall and precision, in addition to the Intersection over Union (IoU) and Dice-score, and by using ROC-curve statistics.


Mean patient age was 63±12 yrs (range: 29-92 yrs) in the training set, and 64±8 yrs (range: 48-80 yrs) in the test set. Primary malignancies included lung, breast, skin, genitourinary and gastrointestinal carcinoma in the training set, and malignant melanoma and lung cancer in the test set. Figure 2 shows an example of the resulting probability map overlaid on a post-gadolinium 3D T1 spin echo image.

The area under the ROC-curve, averaged across all test patients was 0.99. Based on the ROC results, a likelihood threshold for including a voxel as a metastasis was set to 0.96, resulting in a mean precision and recall of 0.83 and 0.72, respectively, and an average IoU- and Dice-score of 0.62 and 0.74, respectively. All metrics for each individual patient are shown in Table 1, and the corresponding ROC curves are shown in Figure 3.


This study demonstrates that a modified GoogLeNet deep neural network trained on multi-modal MRI from one institution can detect and segment brain metastases on patients from another institution with high accuracy. To our knowledge, no previous study has used deep learning for detecting and segmenting brain metastases in a multi-center approach. Single-center studies, such as Liu et al.2 and Charron et al.3, have shown that a CNN-based neural network can detect and segment brain metastases with high accuracy and performance, which is comparable to that of the current study. Liu et al. reported an area under the ROC-curve of 0.98, and Charron et al. showed that a CNN trained on multi-modal MRI data outperformed networks trained on a single MRI contrast.

While this study shows a very high accuracy using deep learning for segmenting brain metastases, the results must be interpreted in light of the limited sample size, especially in the test set. Patients included in the test set primarily presented with few and large metastases, which may be easier for the network to predict. This hypothesis is supported by observations made in another ongoing study utilizing the same neural network, in which the network had a higher accuracy in patients with 3 or less metastases compared to >3 metastases. However, some patients in the current test set also presented with >3 and small metastases, for which the network still demonstrated a high accuracy and performance.


Using deep learning on multi-modal MRI may facilitate automatic detection and segmentation of brain metastases on a multi-center basis, thus helping radiologists across institutions to accurately perform this time-consuming task.


No acknowledgement found.


1. C. Szegedy et al., Going deeper with convolutions, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (2015), pp. 1-9.

2. Y. Liu et al., A deep convolutional neural network-based automatic delineation strategy for multiple brain metastases stereotactic radiosurgery PLoS One, 12 (2017), e0185844,

3. O. Charron et al., Automatic detection and segmentation of brain metastases on multimodal MR images with a deep convolutional neural network. Comput. Biol. Med., 95 (2018), 43-54,


Table 1: Detection and segmentation accuracy.

Figure 1: Flowchart showing the three image inputs used to train the neural network, the modified GoogLeNet architecture, and the resulting output colormap representing a probability map on whether the voxel represents a metastasis, ranging from 0-1 as indicated by the color bar

Figure 2: Images show the resulting predictions (probability maps), as generated by the CNN, overlaid a post-gadolinium SPACE image in (A) a 56-year-old male patient with one brain metastasis from malignant melanoma, (B) a 61-year-old male patient with two brain metastases from lung cancer, and (C) a 58-year-old male patient with two brain metastases from malignant melanoma. The colormap represents the likelihood of including a voxel as a metastasis, ranging from 0 to 1 as indicated by the color bar. The yellow circle represents the delineated metastases.

Figure 3: ROC curves for all 15 patients included in the test set. The average area under the ROC curve (AUC) was 0.989, ranging from 0.924 to 1.00. The outlier showing the lowest AUC is characterized by having a large fraction of unenhanced necrotic tissue, deviating from the other cases in the test set.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)