Oliver Maier^{1}, Matthias Schloegl^{1}, Kristian Bredies^{2}, and Rudolf Stollberger^{1,3}

Reconstructing 3D parameter maps of huge volumes entirely on the GPU is highly desirable due to the offered computation speed-up. However, GPU memory restrictions limit the coverable volume. To overcome this limitation, a double-buffering strategy in combination with model-based parameter quantification and 3D-TGV regularization is proposed. This combination warrants whole volume reconstruction while maintaining the speed advantages of GPU-based computation. In contrast to sequential transfers, double-buffering splits the volume into blocks and overlaps memory transfer and kernel execution concurrently, hiding memory latency. The proposed method is able to reconstruct arbitrary large volumes within 5.3 min/slice, even on a single GPU.

Introduction

Scan time reduction of MR parameter quantification with model-based reconstructionThis work is funded and supported by the Austrian Science Fund (FWF) under grant “SFB F32‐N18” (SFB “Mathematical Optimization and Applications in Biomedical Sciences”); NVIDIA Corporation Hardware grant support; Oliver Maier is a Recipient of a DOC Fellowship (24966) of the Austrian Academy of Sciences at the Institute for Medical Engineering at TU Graz.

1. Block KT, Uecker M, Frahm J. Model-Based Iterative Reconstruction for Radial Fast Spin-Echo MRI. IEEE Traansactions on Medical Imaging, Vol. 28, No. 11, November 2009

2. Sumpf TJ, Uecker M, Boretius S, Frahm J. Model-based nonlinear inverse reconstruction for T2 mapping using highly undersampled spin-echo MRI. J Magn Reson Imag 2011; 34(2):420–428

3. Roeloffs, V. , Wang, X. , Sumpf, T. J., Untenberger, M. , Voit, D. and Frahm, J. (2016), Model‐based reconstruction for T1 mapping using single‐shot inversion‐recovery radial FLASH. Int. J. Imaging Syst. Technol., 26: 254-263. doi:10.1002/ima.22196

4. Wang, X. , Roeloffs, V. , Klosowski, J. , Tan, Z. , Voit, D. , Uecker, M. and Frahm, J. (2018), Model‐based T1 mapping with sparsity constraints using single‐shot inversion‐recovery radial FLASH. Magn. Reson. Med, 79: 730-740. doi:10.1002/mrm.26726

5. Maier O., Schoormans J., Schloegl M., et al. Rapid T1 quantification from high resolution 3D data with model‐based reconstruction. Magn Reson Med. 2018;00:1–18. DOI: 10.1002/mrm.27502

6. Harris M. How to Overlap Data Transfers in CUDA C/C++. https://devblogs.nvidia.com/how-overlap-data-transfers-cuda-cc/#disqus_thread. Website accessed on 31.10.2018

7. Smith DS, Sengupata S, Smith SA, Welch EB. Trajectory optimized NUFFT: Faster non‐Cartesian MRI reconstruction through prior knowledge and parallel architectures. Magn Reson Med 2018; DOI: 10.1002/mrm.27497

8. Klöckner A, Pinto N, Lee Y, Catanzaro B, Ivanov P, Fasih A. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. Parallel Computing 2012; 38: 157-174.

9. Knoll, F.; Schwarzl, A,; Diwoky, C.; Sodickson DK.: gpuNUFFT - An Open-Source GPU Library for 3D Gridding with Direct Matlab Interface. Proc ISMRM p4297 (2014).

10. Lesch A, Schloegl M, Holler M, et al. RUltrafast 3D Bloch‐Siegert B+1‐mapping using variational modeling. Magn Reson Med. 2008. doi:10.1002/mrm.27434.

Figure 1: Basic schematic
of double buffering compared to serial execution. Having two
asynchronous command queues leads to an overlap of transfer and
computation. Ideally the computation time hides the memory latency of
the GPU.

Figure 2: Reconstruction
without a) and with b) double buffering. Relative error c) between
the reconstruction of ten slices using only one block and the
proposed streaming with block size of five after 12 Gauss-Newton
steps. Small deviations are visible at the skull. Background has been
masked out. Deviations along the streamed slice direction are
homogeneous and small, suggesting real 3D regularization is achieved.
Mean-Root-Square-Error (MRSE) and Mean-Absolute-Relative-Error (MARE)
are both numerically negliable.

Table 1: Comparison of reconstruction time for ten slices with the
standard reconstruction proposed^{5}, a pure OpenCL
implementation, and the proposed double buffering strategy.

Figure 3: Reconstruction of
a full brain VFA T1 data set with the proposed double buffering
method. Acquisition parameters 256x256x160 with 30% oversampling in
slice direction, 1x1x1 mm3. TR/TE = 5.38/2.46ms; 10 flip
angles = [1°, 3°,5°, …, 19°]; BW=380Hz/pixel. Acquisition time
6.3 min Reconstruction time 5.3 min/slice.

Figure 4: Transversal,
sagittal and coronal view of the full brain reconstruction in Figure
3. Arrows indicating block boundaries in slice direction. Transition
between blocks is smooth without artifacts.