Automatic Quality Assessment of Pediatric MRI via Nonlocal Residual Neural Networks
Siyuan Liu1, Kim-Han Thung1, Weili Lin1, Pew-Thian Yap1, Dinggang Shen1, and UNC/UMN Baby Connectome Project Consortium2

1Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States, 2Univerisity of North Carolina at Chapel Hill, Chapel Hill, NC, United States


Manual MRI quality assessment is time-consuming, subjective, and error-prone. We show that image quality of contrast-varying pediatric MR images can be automatically assessed using deep learning with near-human accuracy.


Magnetic resonance imaging (MRI) is susceptible to motion artifacts, especially when babies are imaged. Image quality assessment (IQA) is crucial in determining whether the acquired data are usable and whether a re-scan is necessary. However, manual IQA, even when carried out by experienced neuroradiologists, is time-consuming, subjective, and error-prone1. In this abstract, we demonstrate that IQA can be carried out automatically with near-human accuracy via a nonlocal residual neural network that is trained with the data relabelled and pruned to counter erroneous training labels.


A. Data Preparation: T1- and T2-weighted MR volumes of pediatric subjects from birth to age six were annotated manually by an experienced neuroradiologist based on three labels: pass, questionable, and fail (see Table 1). 17600 sagittal slices and 8800 axial slices were extracted from 176 T1-weighted volumes; 25400 sagittal slices and 12700 axial slices were extracted from 254 T2-weighted volumes. Each slice is labeled based on the volume it belongs to. The T1/T2 slice sets are divided into training, validation and testing subsets with ratio 8:1:1. Each slice is uniformly padded to 256×256 and the intensity is min-max normalized.

B. Network Architecture: Our nonlocal residual neural network, shown in Figure 1, incorporates (1) depthwise separable convolution, which is computationally efficient with good feature extraction capability2, and (2) nonlocal blocks, which capture long-range dependencies between features extracted at any two positions, regardless of their positional distance3. Our network consists of two convolution (Conv) blocks, two depthwise separable residual (DSRes) blocks, one nonlocal residual (NRes) block, and one classifier block. The Conv and DSRes blocks extract low- and high-level features, respectively. The NRes block computes the response at each position as a weighted summation of features at all positions in the feature maps. The classifier block (realized with a convolutional layer, global average pooling, and softmax activation function) outputs three probability values indicating whether a slice is “pass”, “questionable”, or “fail”. The slice is labeled based on the highest value among the three probability values.

C. Training and Testing: In the training stage, we initially assumed that each slice can be labelled based on its corresponding volume. However, this assumption is not always correct, as the artifacts may only affect a few slices in a volume and the unaffected slices are hence incorrectly labelled. To deal with noisy labels, we propose to iteratively train the network with a relabeling and pruning strategy. Specifically, we obtain an initial prediction of the labels of all training slices and select slices satisfying the following conditions to retrain the network: (1) Slices with predicted labels identical to initial labels; (2) Slices with high certainty (i.e., with probability threshold 0.7) as belonging to either “pass”, “questionable”, or “fail”. The training samples were pruned by removing slices that do not meet these two criteria. We employed a multi-class balanced focal loss4 to alleviate the data imbalance issue caused by the relabeling process. In the testing stage, we used the trained model to predict the quality of each image slice in the testing dataset. Then, the quality of each volume is determined using the following rules: “pass” if more than 80 percent of the slices in the volume are labeled as “pass”; “fail” if more slices are labeled as “fail” than “pass” or “questionable”; “questionable” if otherwise.


The confusion matrix, together with the sensitivity and specificity of the quality assessment results, for the testing T1- and T2-weighted images are presented in Table 2. It can be observed that the specificity of the “pass” images is 1, indicating that no “questionable” and “fail” images are mistakenly labelled as “pass”. Figure 2 shows some examples of T1- and T2-weighted images for each category, indicating that “pass”, “questionable”, and “fail” correspond respectively to no/minor, moderate, and heavy image degradation. Figure 3 shows detailed testing IQA results of slices and volumes. We can observe that the highest probabilities of slices are high, indicating that our method can assess each slice reliably. Moreover, the volume IQA results match the ground-truth IQA.


We have demonstrated that our nonlocal residual neural network achieves near-human accuracy in IQA. It is therefore possible to reduce human labor with automated IQA.


This work was supported in part by NIH grants (MH117943, EB006733, AG041721, MH100217, NS093842, and 1U01MH110274) and the efforts of the UNC/UMN Baby Connectome Project Consortium.


1. Zhuo J, and Gullapalli R P. MR Artifacts, safety, and quality control. RadioGraphics. 2006; 26(1):275-297.

2. Chollet F. Xception: Deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA; 2017.

3. Wang X, Girshick R, Gupta A, and He K. Non-local neural networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA; 2018.

4. Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. IEEE International Conference on Computer Vision (ICCV). Venice, Italy; 2017.


Table1 Dataset information

Figure 1 Network architecture. Convolution (Conv) and depthwise separable convolution (DSepConv) layers are specified by “Conv/DSepConv| kernel size | strides | channel”. “⊗” denotes matrix multiplication and “⊕” denotes element-wise summation.

Table 2 Confusion matrix, sensitivity, and specificity

Figure 2 Example slices T1- and T2-weighted MR images labels as “pass”, “questionable”, and “fail”, which correspond respectively to no/minor, moderate, and heavy artifacts.

Figure 3 Quality assessment of T1- and T2-weighted images. The slices of each volume are marked by dashed vertical lines.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)