Scanner Variability of MR Based Radiomics Features
Peter Gibbs1, Eun Sook Ko1, Meredith Sadinski1, and Elizabeth Morris1

1Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States


This work utilizes an MR phantom to determine the repeatability, quartile coefficient of dispersion and potential efficacy of textural parameters calculated from gray level co-occurrence matrices, run length matrices, size zone matrices and neighborhood gray tone difference matrices. Images were obtained at 3 different field strengths, across 3 different manufacturers. Parameters based on gray level co-occurrence matrices showed excellent repeatability and low dispersion, whilst still demonstrating excellent discrimination between contrasting regions of interest.


The mining of texture features from medical images and their subsequent use in both classification and treatment prediction model development, essentially radiomics, is a rapidly expanding field. However, there has been little work exploring the repeatability and reproducibility of texture-based parameters extracted from MRI data1. Multi-center trials inevitably involve different manufacturers scanners and varying field strengths, alongside slight variations in acquisition parameters attributable to hardware performance. A greater level of understanding of the spread of values that might be obtained under different test conditions is necessary, as has been noted in CT data2. This work aims to assess these issues using a specially designed phantom containing a range of texture appearances. Ideal parameters will exhibit both good repeatability and reproducibility, whilst also demonstrating sufficient variation between different ROIs to enable discrimination between benign and malignant diseases, or to confidently monitor treatment induced changes.


The texture phantom consisted of a gelatin filled container (~500 ml) into which was suspended a slice of pork meat, a tomato, a caper and an olive, thus providing a range of texture appearances and object sizes. Standard T1w and T2w scans typically utilized in breast imaging were performed on 6 different scanners, namely a 1.5T GE, a 3.0T GE, a 1.5T Siemens, a 3.0T Siemens, a 7.0T Siemens, and a 3.0T Philips (no appropriate T1w scans in this case). All images were obtained within a 24hr period to minimize effects due to phantom deterioration. T1w scan parameters – 1mm slice thickness, ~0.5mm in plane resolution, flip angle 10°, TE 1.6-2.9ms. TR 3.7-6.1ms. T2w scan parameters – 3mm slice thickness, 0.5-0.6mm in plane resolution, TE 100-106ms, TR 3680-3760ms. All scans were performed twice to enable repeatability calculations.

After data acquisition, a central cross-sectional area through the tomato, caper, and olive were manually segmented on all scans, alongside an area of pork meat relatively free of fat. ROI data was then reduced to 16 gray levels to ensure sufficient counting statistics in texture feature calculations. Heterogeneity measures based on first order statistics (variance, skewness, kurtosis, energy, entropy) were determined alongside texture features based on gray level co-occurrence matrices (GLCM)3, run length matrices (RLM)4, size zone matrices (SZM)5 and neighborhood gray tone difference matrices (NGTDM)6.

Repeatability was visually assessed using Bland-Altman plots and calculated as 2.77 times the common standard deviation of repeated measures7. Data scatter was determined using the quartile coefficient of dispersion which is generally regarded as more robust than the coefficient of variation. Finally, texture parameter differences between the segmented objects were explored using the non-parametric Friedman test for k-related samples.


Example T2w images of the gelatin phantom are shown in Figure 1, detailing its appearance. For the first order statistics, skewness and kurtosis demonstrated the poorest repeatability, whilst entropy was highly repeatable (<10% variation) for both T1w and T2w data. Poor repeatability was also noted for the GLCM based parameters f15 (cluster shade) and f16 (cluster prominence). Example Bland-Altman plots used to assess repeatability are shown in Figures 2 and 3. Generally, parameters based on the T1w weighted images were more repeatable than those based on the T2w weighted images, across all classes of texture calculation. However, the quartile coefficient of dispersion was usually lower for the T2w scans, possibly indicating less variation in image contrast across the scanners. Figure 4 details the ten best performing parameters in terms of both repeatability and dispersion for both forms of image contrast. As can be seen, eight of these parameters are constant across all four criteria. Apart from f9 (GLCM based entropy) all these parameters were significantly different between the four ROIs for both T1w (ranging from p<0.0001 to p=0.041) and T2w (ranging from p<0.0001 to p=0.034), indicating their potential efficacy.


From the results it is apparent that texture parameters based on gray level co-occurrence matrices are generally more repeatable and have lower dispersion values than those calculated from either RLM, SZM, or NGTDM. Their excellent repeatability indicates that relatively small changes in parameter values detected during longitudinal studies of chemotherapy response for example, can be confidently attributed to a true underlying change in the tumor rather than any normal fluctuations inherent in noisy data. The low values of quartile coefficient of dispersion also suggest that these parameters are robust to changes in field strength and scanner manufacturer, and minor changes in acquisition protocol, and are thus potentially suitable for use in multi-center studies. Finally, the statistically significant differences noted for these parameters between the four ROIs reinforces their potential clinical efficacy.


The authors would like to thank Dr Kristen Zakian, Dr Danny Zhang, Dr Ryan Brown, and Professor Linda Moy for their help in facilitating scanner access and protocol advice.


1. Waugh SA, Lerski RA, Bidaut L, Thompson AM. Med Phys 2011; 38:5058-5066.

2. Mackin D, Fave X, Zhang L, et al. Invest Radiol 2015; 50:757-765.

3. Haralick RM, Shanmugam K, Dinstein I. IEEE Trans Syst Man Cybern 1973; SMC3:610-621.

4. Conners RW, Trivedi MM, Harlow CA. Comput Vis Graph Image Process 1984; 25:273-310.

5. Galloway MM. Comput Vision Graph 1975; 4:172-179 (1975).

6. Thibault G, Fertil B, Navarro C, et al. Intern J Pattern Recognit Artif Intell 2013; 27:1357002.

7. Bland JM, Altman DG. Br Med J 1996; 313:744-753.


Example T2w images of the gelatin phantom obtained on a 3.0T GE scanner, detailing the olive, caper (left) and tomato (right) utilized as ROIs providing different texture appearances. A section of the large dark area at the top of the phantom (pork meat) was used as a region of relatively homogeneous signal.

Bland Altman plot for f8 determined from T2w images demonstrating good repeatability (9.0%) .

Bland Altman plot for f16 determined from T2w images demonstrating poor repeatability (53.0%).

The ten best performing texture parameters with respect to repeatability and quartile coefficient of dispersion (QCoD). Parameter descriptors: f3 (correlation), f6 (sum average), f8 (sum entropy), f9 (entropy), f11 (difference entropy), f13 (info. measure of correlation), f14 (maximal correlation coefficient), SRE (short run emphasis), RP (run percentage), SZE (small zone emphasis), LZLGE (large zone low gray emphasis).

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)