Comparison of 12 different constructs of pre-trained convolutional encoders for semantic segmentation in prostate brachytherapy MRI
Jeremiah Wayne Sanders1, Steven Frank2, Gary Lewis3, and Jingfei Ma1

1Imaging Physics, University of Texas MD Anderson Cancer Center, Houston, TX, United States, 2Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, United States, 3Radiation Oncology, University of Texas Medical Branch, Galveston, TX, United States


Anatomy contouring is essential in quantifying the dose delivered to the prostate and surrounding anatomy after low-dose-rate prostate brachytherapy. Currently, five anatomical structures including the prostate, rectum, seminal vesicles, external urinary sphincter, and bladder, are contoured manually by a radiation oncologist. In this work, we investigated six convolutional encoder-decoder networks for automatic segmentation of the five organs. Six pretrained convolutional encoders and two loss functions were investigated. This yielded twelve different models for comparison. Results indicated that classification accuracy of convolutional encoders pretrained on the ImageNet dataset positively correlated with semantic segmentation accuracy in prostate MRI.


Convolutional encoder-decoder networks (e.g. U-Net [1]) have demonstrated success in a number of semantic segmentation tasks in MRI. We have demonstrated that fully convolutional networks can be successfully trained to perform anatomy contouring in prostate brachytherapy MRI [2]. However, several methods for semantic segmentation exist including those that utilize the U-Net architecture or a variant of the U-Net architecture by replacing the convolutional encoder with a network pre-trained on the ImageNet dataset. Since the publication of AlexNet in 2012, many networks have been developed for image classification on ImageNet, and the classification accuracy of these networks has continued to increase. As such, there is a large search-space that must be investigated to find the appropriate pre-trained network for a new semantic segmentation application using a transfer learning approach. Additionally, multiple loss functions can be used for training semantic segmentaion networks. In this work, we investigate 6 pre-trained convolutional encorders and two loss functions for semantic organ segmentation in post-implant prostate brachytherapy MRI.


Sixty-seven post-implant patients were scanned with a 3D balanced steady-state free precession pulse sequence (CISS) on a 1.5T Siemens Aera scanner using a 2-channel rigid endorectal coil in combination with two 18-channel pelvic array coils [3]. The prostate, rectum, seminal vesicles (SV), external urinary sphincter (EUS), and bladder were segmented by a board certified radiation oncologist (S.F.) and a radiation oncology resident (G.L.). A total of 4999 slices were available, which were split into 3666/917/416 for training/cross validation/testing.

A deep learning application engine [4] was used to construct and train all models. Six convolutional encoder-decoder networks were constructed using six convolutional encoders pre-trained on the ImageNet dataset: VGG16 [5], VGG19 [5], DenseNet121 [6], DenseNet169 [6], DenseNet201 [6], and Xception [7]. The same decoder was used for each network (Figure 1). Resize convolutional layers in the decoder were chosen over transpose convolutions to avoid the checkerboard artifacts characteristic of transpose convolutions [8]. Each of the six networks were trained using two different loss functions: cross-entropy and Tversky loss (α=0.5, β=0.5). This yielded a total of 12 different models trained to perform anatomy segmentation.

Model training was performed on a Dell 7920 rack mounted server (operating under the Linux RedHat v7.2) with 4 K2200 GPUs connected with SLI technology. An Adam optimizer was used to train all models. The initial learning rate was set to 1*10-4 and was decayed by 20% if the validation loss didn’t improve in 3 epochs. Training was terminated when no reduction in the validation loss occurred after 10 epochs.

Overall pixel-wise classification accuracy was compared among all 12 trained models. Organ-wise volumetric segmentation accuracy was assessed by computing pixel-wise classification accuracy, dice similarity coefficient, and intersection over union.


The classification accuracy of the pre-trained convolutional encoders on ImageNet positively correlated with pixel-wise classification on post-implant prostate brachytherapy MR image segmentation (Figure 2). Overall, using Tversky loss produced higher segmentation accuracy across all the networks than cross-entropy. The segmentation of the EUS was notceably worse than the other organs, likely due to its much smaller size. Using Xception as the pre-trained convolutional encoder and training with Tversky loss produced the highest segmentation accuracy overall with all organs considered (Figure 3).


A large search space exists when performing semantic segmentation in medical imaging using a transfer learning approach. We attempted to reduce the search space for anatomy segmentation in post-implant brachytherapy MRI by exploring convolutional encoders pre-trained on ImageNet and two different loss functions. When exploring transfer learning for new semantic segmentation applications in MRI, researchers may expect to receive higher segmentation accuracy when using convolutional encoders with higher classification accuracy on ImageNet.


Higher segmentation accuracy can be expeted when using pre-trained convolutional encoders with the highest classification accuracy on ImageNet. Training convolutional encoder-decoder networks for semantic segmentation with Tversky loss produced higher segmentation accuracy than with cross-entropy. Overall, using an Xception convolutional encoder trained with Tversky loss produced the highest semantic segmentation performance in post-implant prostate brachytherapy MRI.


No acknowledgement found.


[1] Ronneberger O, Fischer P, Brox T. “U-Net: Convolutional Networks for Biomedical Image Segmentation”, arXiv:1505.04597v1, 2015.

[2] Sanders J, Lewis G, Frank S, et al. “A fully convolutional network utilizing depth-wise separable convolutions for semantic segmentation of anatomy in MRI of the prostate after permanent implant brachytherapy”, Proc. ISMRM Workshop on Machine Learning, Pacific Grove, CA, 2018.

[3] Sanders JW, Song H, Frank SJ, et al. “Parallel imaging compressed sensing for accelerated imaging and improved signal-to-noise ratio in MRI-based postimplant dosimetry of prostate brachytherapy,” Brachytherapy;17(5):816-824, 2018.

[4] Sanders J, Fletcher J, Frank S, et al. “Deep Learning Application Engine (DLAE): end-to-end development and deployment of medical deep learning algorithms”, Proc. ISMRM Workshop on Machine Learning Part II, Washington, DC, 2018.

[5] Simonyan K, Zisserman A. “Very Deep Convolutional Networks for Large-Scale Image Recognition”, arXiv:1409.1556v6, 2015.

[6] Huang G, Liu Z, van der Maaten L, et al. “Densely Connected Convolutional Networks”, arXiv:1608.06993v5, 2018.

[7] Chollet F. “Xception: Deep Learning with Depthwise Separable Convolutions”, arXiv:1610.02357v3, 2017.

[8] Odena A, Dumoulin V, Olah C. "Deconvolution and Checkerboard Artifacts", Distill, 2016. http://doi.org/10.23915/distill.00003


The structure of the convolutional encoder-decoder architectures investigated for segmenatic segmentation of anatomy in post-implant prostate brachytherapy MRI. All convolutional layers in the decoder used a kernel size of (3,3) and a stride of (1,1).

Correlation of ImageNet classification performance with pixel-wise classification performance for the 12 different models investigated.

Semantic segmentation results (pixel-wise classification accuracy, dice similarity coefficient, and intersection over union) for the 12 models investigated.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)