Guanhua Wang^{1,2}, Enhao Gong^{2,3}, Suchandrima Banerjee^{4}, Karen Ying^{5}, Greg Zaharchuk^{6}, and John Pauly^{2}

Previous CS frameworks based on Deep Learning like GANCS have demonstrated improved quality and efficiency. To further improve the restoration of the high-frequency details and the suppression of aliasing artifacts, a data-driven regularization is explicitly added on the k-space, in the form of an adversarial loss (GAN). In this work, the cross-domain generative adversarial model is trained and evaluated on diverse datasets and show decent generalization ability. For both quantitative comparison and visual inspection, the proposed method achieves better reconstruction than previous networks.

**Datasets**: Three fully-sampled datasets are used in the experiment with retrospective undersampling. The first dataset is composed of T1w and T2w images from the Human Connectome Project (HCP).^{4} 70 healthy cases and 20000 images are included. The second dataset consists of 20 open-sourced cases of 3D FSE knee images.^{5} 104 cases of neuroimaging scanned by the Multi-Dynamic Multi-Echo sequence (MDME) are used in the third set.^{6} The latter two datasets are complex-valued. In the current setting, we used a variable density Cartesian mask for undersampling. The undersampling factor is set to 4.

**Network**: A deep network with encoder-decoder structure and multiple residual blocks is built as the generator [Fig.2]. The output was projected back to k-space and combined with the original data for consistency in k-space. The backbone network and k-space consistency trick have been proven to be efficient by recent works.^{1,}^{3} A patch-based discriminator is implemented in the image-domain, while another multi-scale patch-based discriminator is implemented in k-space. The multi-scale discriminator ensures that the frequency distribution is correct both on the local scale and global scales. The complex inputs are formulated as two network channels.

The loss function contains the content loss and the adversarial loss.^{7} The content loss measures the distance between the generated images and the ground truth. In the current setting L1-loss and SSIM loss are applied in the image domain. For the adversarial loss, the fully-sampled images and the reconstructed images are discriminated in both image domain and frequency domain. The generator's loss is: $$\underset{g}{min}E_{x,y}[\|G(x)-y\|_{1}]+\eta_{1} E_{x}[(1-D_{k}(FFT(G(x))))^{2}]+\eta_{2} E_{x}[(1-D_{I}(G(x)))^{2}]$$ The discriminator's loss is: $$\underset{d}{min}\gamma_{1}(E_{x}[D_{k}(FFT(G(x)))^{2}]+E_{y}[(D_{k}(FFT(y))-1)^{2}])+\gamma_{2}(E_{x}[D_{I}(G(x))^{2}]+E_{y}[(D_{I}(y)-1)^{2}])$$

$$$x$$$ is the input (under-sampled data), $$$y$$$ is the fully-sampled data.

**Training**: For a robust training and faster convergence, a Least-Square GAN (LSGAN) derivation is used. Moreover, to prevent mode collapse, we adopt a two-step training strategy to add adversarial loss only when the generator is stable with the content loss. The network converges after 150 epochs with the linear decay of learning rate.

For quantitative comparison, the newly proposed k-space adversarial loss achieved higher PSNR and SSIM than previous methods only using the image-domain constraint (showed in [Tab.1], significant based on a paired t-test).

For the qualitative comparison, examples from different datasets are showed. As is shown in [Fig.3,4], using the I+K Cross-domain GAN leads so the least visible artifacts, distortion, and hallucinations. The k-space of the generated images is also interpolated more evenly.

Previous image domain reconstruction networks, even with GAN loss, only consider image domain loss which leads to over-smoothing or mis-estimation of high-frequency details. The reason may lie in that pixel-wise losses like L2 norm reach the minima by producing an optimal result in the average sense. By adding a k-space adversarial loss, we add a constraint on k-space explicitly and enforce the image to have a realistic spectrum consistent with the acquisition.

Constraint on the frequency domain is not a very popular idea in similar DL-based image restoration tasks, like super-resolution and de-noising. The reason is that great variance exists in the spectra of natural images. Therefore, 'realistic' spectra have no explicit perceptual meaning. In the MR images, the frequency distribution within and across cases have the inherent correlation, depending on the sequences and scanners. A data-driven regularization on k-space could potentially contribute to better k-space interpolation. This insight also introduces two directions of further study. First, multi-channel images could be incorporated into this framework. The network could probably learn the GRAPPA kernels inexplicitly. On the other hand, special attention should be paid to the network's generalization ability on different scanners, coils, and protocols. Moreover, a reader study should be conducted to evaluate the method's diagnostic performance.

1. Mardani M, Gong E, Cheng JY, et, al. Deep generative adversarial networks for compressed sensing automates MRI. arXiv. 2017.

2. Zhu B, Liu JZ, Cauley SF, et, al. Image reconstruction by domain-transform manifold learning. Nature. 2018.

3. Schlemper J, Caballero J, Hajnal JV, et, al. A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE TMI. 2018.

4. Van Essen DC, Smith SM, Barch DM, et, al. The WU-Minn human connectome project: an overview. Neuroimage. 2013.

5. Epperson K, Sawyer AM, Lustig M, et, al. In 22nd Annual Meeting for SMRT, 2013.

6. Tanenbaum LN, Tsiouris AJ, Johnson AN, et, al. Synthetic MRI for clinical neuroimaging: results of the Magnetic Resonance Image Compilation (MAGiC) prospective, multicenter, multireader trial. AJNR. 2017.

7. Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. arXiv. 2017.

Figure 1. Left: k-space (in the logarithmic scale) of a fully-sampled brain image.

Right: k-space of an under-sampled brain image reconstructed by GANCS.^{1}

In the example of previous networks (GANCS), the high-frequency part of the reconstructed image is less efficiently restored, which may explain the blurring and the loss of details.

Figure 2. The structure of the network. The input of the network is the fully-sampled images, either complex-valued or real-valued. The generator is a deep ResNet for image translation.^{1} The generated images and the fully-sampled images are classified by the discriminator in both the image domain and the frequency domain.

Table 1. The quantitative comparison of different methods. A deep generator (w/o GAN), a generative adversarial model on the image domain (I GAN) and the proposed model (I+K GAN) were implemented. The Structural Similarity Index (SSIM) and Peak SNR (PSNR) are selected as the metrics. The improvement of the proposed method is significant at the p<0.05 level using the paired t-test.

Figure 3. An example of the MDME dataset. Small anatomical features like vessels are restored with better fidelity by the proposed method (I+K GAN), and the least hallucinations are introduced. In the k-space (in the logarithmic scale) of the reconstructed images, the proposed method interpolates the k-space more evenly and the high-frequency information is more efficiently restored.

Figure 4. An example from the HCP dataset. The proposed method restores the structure with better fidelity and sharpness.