Mario O. Malavé^{1}, Srivathsan P. Koundinyan^{1}, Christopher M. Sandino^{1}, Frank Ong^{2}, Joseph Y. Cheng^{1,3}, and Dwight G. Nishimura^{1}

In this work, we demonstrate
the application of a non-Cartesian unrolled architecture in reconstructing images
from undersampled 3D cones datasets. One shown application of this method is
for reconstructing undersampled 3D image-based navigators (iNAVs), which enable
monitoring of beat-to-beat nonrigid heart motion during a cardiac scan. The proposed non-Cartesian unrolled network architecture provides
similar outcomes as *l*_{1}-ESPiRIT
in one-twentieth of the time, and the reconstructions exhibit robustness when using
an undersampled 3D cones trajectory.

Neural Network Architecture

The UN is based on the iterative soft-shrinkage algorithm (ISTA) [4], which solves the following inverse problem
with image *x*, *k*-space data *y*, encoding operator *A*,
and regularization term *R(x)*:$$\underset{x}{\text{minimize }}\left\| Ax-y \right \|^2_2 + R(x)$$ The solution, which
is found using proximal gradient descent, iterates between the shrinkage
thresholding and data consistency steps:$$x^{k+1} = S_R(x^k - A^T(Ax^k - y))$$ When using non-Cartesian
data, acquisition model A incorporates coil sensitivity maps computed using
ESPIRiT [6], and the non-uniform FFT (NUFFT) operator. Note that in prior
usages of the UN for Cartesian datasets, the acquisition model
applied an FFT. The regularization term is implicitly learned by replacing its
proximal operator *S _{R}* with a CNN.

The UN
architecture uses 5 gradient steps (iterations) consisting of 2 residual
network (ResNet) [7] blocks per step. The input into the network is the
undersampled 3D *k*-space data, *k*-space coordinates (to generate the
NUFFT operator), and the respective coil sensitivity maps for each channel. The
ground truth for training is the undersampled dataset reconstructed with *l*_{1}-ESPIRiT [6]. The input into
each gradient step is comprised of coil-combined image-space data (transpose
NUFFT operator applied to the *k*-space
data) after performing SENSE reconstruction [8]. Each ResNet block uses two 3D convolutional layers with a kernel size of 3x3x5 and filter depth of 128 followed by a ReLU activation
function. An additional layer is added to the end of each unrolled step which
outputs 2 channels (real and imaginary) for the *k*-space data. The final layer is added to a skip connection
from the input of the first ResNet block to accelerate training convergence. The data is then converted back to *k*-space for the data consistency step described
above and the gradient step block is repeated 4 more times (5 total). The
proposed architecture (Figure 1) was implemented in Python with TensorFlow.

Training Data

One application of
this method is for reconstructing 3D image-based navigators (iNAVs), which
enable monitoring of beat-to-beat nonrigid heart motion during a cardiac scan.
Using an undersampled 3D cones trajectory, a cardiac-triggered low-resolution
3D image of the heart can be collected every heartbeat (28x28x14 cm^{3}
FOV, 4.4 mm isotropic spatial resolution, 176 ms temporal resolution). The
reconstruction requirements are substantial as each scan involves the
collection of several hundred 3D iNAVs.

For training, we
collected 500 3D iNAVs on a 1.5T GE Signa system with an 8-channel cardiac coil
[9]. Specifically, 500 cardiac datasets were acquired separately in different
heartbeats using a variable density conical trajectory consisting of 32
readouts, which corresponds to an acceleration factor of 9. Also, an additional
500 datasets were acquired with the same trajectory rotated by the golden angle
between each heartbeat to improve further generalization of the model with
different *k*-space sampling patterns.
More details for both trajectories are shown in Figure 2.

An example input
(gridded image), output, and ground truth (after *l*_{1}-ESPIRiT) is shown in Figure 3a and the respective
outputs for each iteration (gradient step) are shown in Figure 3b. When using
the *l*_{1} loss, the training
error converged quickly and peak signal-to-noise ratio
(PSNR) gradually increased (Figure 3c). When employing the trained architecture, the undersampled cardiac
images (compared to the outcomes from gridding) retained structural features as
a result of the denoising/smoothening operation. More specifically, aliasing artifacts arising from undersampling a cones trajectory were
effectively removed after evaluation by the network. Output images from testing
the model with unseen datasets are shown in Figures 4 and 5. Inference
time for the proposed architecture is 1 second per dataset, while *l*_{1}-ESPiRIT requires
approximately 20 seconds.

The proposed non-Cartesian
UN architecture provides similar outcomes as *l*_{1}-ESPiRIT in one-twentieth of the time. We have shown
that the proposed non-Cartesian UN exhibits robustness when using an undersampled 3D cones trajectory.

We gratefully acknowledge the support of NIH grants R01HL127039, T32HL007846, and the National Science Foundation Graduate Research Fellowship under Grant No. DGE-114747.

[1] Hammernik, Kerstin, et al. "Learning a variational network for reconstruction of accelerated MRI data." Magnetic Resonance in Medicine 79.6 (2018): 3055-3071.

[2] Diamond, Steven, et al. "Unrolled Optimization with Deep Priors." arXiv preprint arXiv:1705.08041 (2017).

[3] Cheng, Joseph Y., et al. "Highly Scalable Image Reconstruction using Deep Neural Networks with Bandpass Filtering." arXiv preprint arXiv:1805.03300 (2018).

[4] Daubechies, Ingrid, et al. "An iterative thresholding algorithm for linear inverse problems with a sparsity constraint." Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 57.11 (2004): 1413-1457.

[5] Hauptmann, Andreas, et al. "Real‐time cardiovascular MR with spatio‐temporal artifact suppression using deep learning–proof of concept in congenital heart disease." Magnetic Resonance in Medicine (2018).

[6] Uecker, Martin, et al. "ESPIRiT — an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA." Magnetic Resonance in Medicine 71.3 (2014): 990-1001.

[7] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

[8] Pruessmann, Klaas P., et al. "SENSE: sensitivity encoding for fast MRI." Magnetic Resonance in Medicine 42.5 (1999): 952-962.

[9] Addy, Nii Okai, et al. "3D image‐based navigators for coronary MR angiography." Magnetic Resonance in Medicine 77.5 (2017): 1874-1883.

Figure 1. (a) The unrolled
model architecture uses 5 gradient steps (iterations) consisting of 2 ResNet blocks per step (N=5 and M=2). The input into the network is the
undersampled 3D *k*-space data (gridded image shown above), *k*-space coordinates (to generate the
NUFFT operator), and the respective coil sensitivity maps for each channel. The ground truth is the *l*_{1}-ESPIRiT
reconstruction of input *k*-space data.
(b) One key difference with the
proposed non-Cartesian model (top) and prior Cartesian model (bottom) is the
replacement of the data consistency step (hard-projection) with a gradient
descent update step (soft-projection), which maintains consistency with the
measured non-Cartesian *k*-space data.

Figure 2. (a) The
cardiac datasets are acquired using a variable-density, undersampled 3D cones
trajectory represented by the cones polar angles (left) and corresponding
readouts (right). (b) In addition, the trajectory used for half of the training
data is rotated by the golden angle between heartbeats.

Figure 3. (a)
Example input (gridded image), output, and ground truth (*l*_{1}-ESPIRiT) are shown for one of the training datasets.
(b) The respective outputs for each of the 5 iterations (gradient steps)
highlights the behavior of different blocks in the model. (c) The *l*_{1} training loss (top) and
PSNR (bottom) are shown after training for 7000 iterations with a batch size of
1.

Figure 4. The
axial, sagittal, and coronal slices for the input (gridded
image), output, and ground truth (*l*_{1}-ESPIRiT)
images are shown for a validation dataset. In the images, we can see that the
model effectively removes the aliasing artifacts present in the gridded images.

Figure 5. The
axial, sagittal and coronal slices for the input (gridded image), output, and
ground truth (*l*_{1}-ESPIRiT) images
are shown for another validation dataset. As previously seen in Figure 4, we observe
that the model performs well despite the presence of more severe aliasing
artifacts in the gridded images.