Target-class-agnostic feature rejection for radiomics analyses based on variations of tumor segmentation mask
Balthasar Schachtner1, Michael Ingrisch1, Gresser Eva1, Moritz Schneider1, Andrea Schreier1, Olga Solyanik1, Guiseppe Magistro2, and Dominik Nörenberg1

1Department of Radiology, Munich University Hospitals, LMU, Munich, Germany, 2Department of Urology, Munich University Hospitals, LMU, Munich, Germany


Feature selection is a key aspect to radiomics analyses. An approach to remove features which are not stable with respect to small variations of the segmented mask is presented. The rejection works target-class agnostic and can be used in combination with target-class-based selections. An increase of about 5 percentage points can be seen when using the proposed approach in a simple machine learning setup on prostate MRI of prostate cancer patients.


Radiomics tools can easily produce more than 1000 features, therefore feature selection is a key ingredient for a stable classification algorithm.1 Features which are not stable with respect to small variations of the segmentation mask of the region of interest are undesirable and should be removed. The approach proposed here identifies the unstable features while being agnostic to the target class of the classification problem.


A dataset of 86 patients with histologically proven, non-treated prostate cancer was used in a radiomics analysis. T2-weighted images and ADC maps were obtained for all patients using 3T MRI scanners. Segmentations of the tumors were manually marked by expert radiologists on both the T2-weighted images and ADC maps individually. Features were calculated following the prescriptions of the "image biomarker standardisation initiative" (IBSI) on both the T2-weighted image and the ADC map and include shape features of the segmentation and first-order features of voxel intensities in the segmented region of interest. Furthermore special texture features were calculated from discretized gray-level matrices. In total, 1482 features were calculated using the pyradiomics package2 using different filters as wavelets, Laplacian over Gaussian and local binary patterns.

To assess the influence of the exact delineation of the segmentation, the features were recalculated for variations of the segmentation masks. For each patient the masks were dilated and eroded by one pixel in the plane of high resolution. The impact of these variations on the value of the feature of each patient with respect to the distribution of the values of the feature of all patients was evaluated using the intraclass correlation coefficient (ICC). Figure 1 shows the ICC for all features and the window of ICC between 0.6 and 0.8.

The impact on classification performance was tested using a simple machine learning setup implemented in scikit-learn3: After stability-based feature selection, a fixed number of 25 features was selected using the minimum-redundancy maximum-relevancy (mRMR)4 algorithm to remove effects arising from reducing the number of features. To check the influence of restricting the number of features, a random selection of features matching the number of features at each ICC threshold was evaluated for comparison. The classification performance was evaluated in both cases with a random forest in cross-validation. The target of the radiomics study was the discrimination of high- and low-grade Gleason scores based on the MRI images, therefore the simple machine learning setup was trained to discriminate patients with a Gleason score of at most 6 from patients with a Gleason score of at least 7.


In the simple machine learning setup, the auROC could be raised by approximately 5 percentage points between not using the feature rejection and using a cut on ICC between 0.6 and 0.8. Figure 2 shows the auROC for feature rejection based on the ICC criterion in comparison with the random feature selection. Within the classes of features, only the shape features are mostly stable with respect to the ICC criterion. In the each of the classes of the first-order and texture features unstable features can be identified. Features from the ADC maps are in general less stable than features from the T2-weighted images.


The results show that the removal of unstable features is well motivated and can improve the classification performance. At least in practical applications, manual segmentation of tumors cannot be precise down to single pixels. Therefore the generalizability of models trained on unstable features will be worse than models trained on stable features only. Removing unstable features may lead to a smaller set of features with better classification performance. As one would expect, the shape features are stable with respect to small variations of the segmentation mask. First-order and texture features can be sensitive on the inclusion of pixels with high or low intensities, but the interplay of filters and features can be hard to predict. The cut on ICC is introduced as a new hyperparameter in the machine learning setup which can be optimized for the given dataset.


Feature rejection based on variations of the segmented masks provides a target-class agnostic way to reject unstable features and may help to improve the performance of classification algorithms. It can be used in combination with target-class-based feature selection algorithms and may be optimized for the given set of image data.


No acknowledgement found.


1. Ingrisch M et al. Radiomic Analysis Reveals Prognostic Information in T1-Weighted Baseline Magnetic Resonance Imaging in Patients With Glioblastoma. Invest Radiol. 2017 Jun; 52(6):360-36

2. Griethuysen, J. J. M., Fedorov, A., Parmar, C., Hosny, A., Aucoin, N., Narayan, V., Beets-Tan, R. G. H., Fillon-Robin, J. C., Pieper, S., Aerts, H. J. W. L. (2017). Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Research, 77(21), e104–e107.

3. Pedregosa et al., Scikit-learn: Machine Learning in Python. JMLR 12, pp. 2825-2830, 2011.

4. Brown, Gavin et al. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection. JMLR 2012.


Figure 1: ICC of features, sorted by descending ICC. Red dashed lines indicate the window of ICCs between 0.6 and 0.8.

Figure 2: auROC of the simple machine learning setup at different thresholds of ICC. The auROC at each threshold for the proposed feature selection based on the ICC of the feature variation is shown in blue. For comparison a random selection of features (matching the number of features at each ICC threshold) is shown in orange.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)