Junshen of Xu^{1}, Molin Zhang^{2}, Larry Zhang^{1,3}, Ellen Grant^{4,5}, Polina Golland^{1,3}, and Elfar Adalsteinsson^{1,6}

Prospective motion correction is a challenge in clinical fetal MR imaging as fetal motion is erratic and often substantial. To address this problem, we propose a two-stage machine learning pipeline to extract fetal poses from echo planar MRI volumes at previous time points to predict future pose. This pipeline can be used to learn kinematic models of fetal motion and serve as valuable auxiliary information for real-time, online slice prescription in fetal MRI.

Volumetric MR data were acquired with multislice EPI imaging (matrix size = 120*120*80, resolution = 3mm*3mm*3mm, TR = 3.5s) of the pregnant abdomen for subjects at gestational age between 25 and 35 weeks. Fifteen fetal features, so-called keypoints (ankles, knees, hips, bladder, shoulders, elbows, wrists and eyes) were labeled manually for 1171 volumes. Then, 1113 sequential time points with duration 10 frames each, were extracted. The objective of the pose prediction task is to estimate the pose of fetus at each of the four time points that follow the six preceding 3D MR EPI volumes.The proposed pipeline for motion prediction is shown in Figure 1, which consists of two stages, namely, pose estimation from volumetric MRI and motion prediction based on pose representation.In the first stage, a convolution neural network is used to estimate fetal poses (via keypoint coordinates) from 3D MRI. Inspired by human pose estimation for 2D image [1], here we propose a 3D hourglass network to 3D fetal pose estimation (see Figure 2). The network uses upsampling and downsampling operations to capture multiscale feature of the images and use residual connections to preserve high resolution information.The second stage, motion prediction, can be regarded as the following autoregression problem

$$x_t=f(x_{t-1},x_{t-2},…,x_{t-k}),$$

where xt is the pose of fetus at time t and f is a function used to predict the next pose given the pose at k previous time points. Three different autoregression algorithm that have previously been proposed for human motion prediction and other time series prediction problems were implemented and compared for the fetal pose prediction problem, including linear autoregression model (LAR), recurrent neural network (RNN) [2] and Autoregressive Trees (ART) [3]. In the RNN model, we adapted a single-layer, gated recurrent unit [4] architecture with 1024 hidden units. As for the ART model, we used a random forest as the based model, which is more robust to noise in the data. Given the limited data, a five-fold cross validation was used to evaluated the performance of the different methods.

[1] Newell, A., Yang, K., & Deng, J. (2016, October). Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (pp. 483-499). Springer, Cham.

[2] Martinez, J., Black, M. J., & Romero, J. (2017, July). On human motion prediction using recurrent neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4674-4683). IEEE.

[3] A. M. Lehrmann, P. V. Gehler, and S. Nowozin. Efficient nonlinear markov models for human motion. In CVPR, 2014[4] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

[4] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

Figure 1. The proposed two-stage pipeline for prediction fetal motion from volumetric MRI. Stage 1: fetal pose estimation from 3D MRI. Stage 2: motion estimation based on fetal pose.

Figure 2. Overall Architecture of the proposed network.

Figure 3. Principal component analysis of two subjects. (a) and (c) are the plot of the first three Principal components of the two subjects. (b) and (d) are the sample autocorrelation with confidence bounds of the first principal component of the two subjects, which show the correlation of motion between certain time lags.

Figure 4. NRMSE with different prediction lengths for different methods.

Figure 5. Examples of motion prediction of different methods at different time points. Dashed lines are the predicted pose while solid lines are the ground truth. Purple triangle captures eyes and base of neck, blue connects shoulders, which connect with arms (green, red). Light blue connects base of neck with bladder, which links hips (blue) and legs (green, red).