Abdul Basit^{1}, Omair Inam^{1}, and Hammad Omer^{1}

In real-time clinical settings, high speed
systems have become imperative to meet the large data processing requirements
of parallel MRI algorithms e.g. SENSE. Field-Programmable-Gate-Arrays (FPGAs)
have recently emerged as a viable solution to adhere the rising demands of fast
data processing by exploiting the inherent parallelism of SENSE reconstruction
algorithm. This paper presents the first design effort to implement high
performance 32-bit floating-point FPGA-based coprocessor for real-time SENSE
reconstruction using high-level-synthesis (HLS) frame work. *In-vivo* results of 8-channel 1.5T human-head
dataset show that the proposed system speeds up the image reconstruction time up
to 1000x without compromising the image quality.

INTRODUCTION

SENSE (Sensitivity Encoding) is a robust parallel image reconstruction method that employs accelerated data acquisition using multi-channel receiver coils ^{}The proposed 32-bit floating-point FPGA-based
coprocessor for SENSE reconstruction (32-co-SENSE) is designed and implemented
using a contemporary tool named as vivado-HLS as shown in Figure 1. The development flow of the proposed
coprocessor (Figure 1) is targeted for the Xilinx-FPGA with clock frequency of
400MHz where, vivado-HLS accepts synthesizable C++ implementation of SENSE
algorithm as an input and performs advanced platform-based code transformations
and synthesis optimizations (using directives/pragmas) to generate optimized
synthesizable RTL. Furthermore, automatic co-simulations are performed using
test bench wrappers to verify the correctness of the RTL output. Finally, the
synthesizable RTL of the proposed co-processor is transformed into a complete
FPGA implementation using vivado-design tools.

The proposed 32-co-SENSE iteratively performs
complex matrix inversions for unfolding and combining the multi-channel aliased
images and depositing the final composite image in the memory unit $$$M_{p}$$$ as shown in Figure 2. During each iteration,
32-co-SENSE reads multi-channel folded images and sensitivity encoding matrix
from the pre-loaded memory units i.e. $$$M_{f}$$$ and $$$M_{s}$$$ respectively and subsequently performs
least-squares i.e. $$$ρ=\left[\left(S^{H}\times{S}\right)^{-1}\times{S^{H}}\right]\times{I}$$$, on the set of elements from the encoding matrix $$$S\in\mathbb{C}^{N_{c}\times{A_{f}}}$$$ and set of folded pixels $$$I\in\mathbb{C}^{N_{c}\times{1}}$$$ for each receiver coil, to recover the actual
values of pixels $$$ρ\in\mathbb{C}^{N_{c}\times{1}}$$$ in the solution
image. The
proposed architecture of the 32-co-SENSE is featured by single-precision
parallelized complex rectangular matrix-inversion module as a part of the least-squares
solution, consisting of four pipelined combinational units i.e. $$$MUL_{MM1}$$$, $$$INV$$$, $$$MUL_{MM2}$$$ and $$$MUL_{MV}$$$ as shown in
Figure 2, where $$$MUL_{MM1}$$$ and $$$MUL_{MM2}$$$ perform complex matrix-matrix
multiplication, $$$INV$$$ performs
complex matrix-inversion and $$$MUL_{MV}$$$ performs
complex matrix-vector multiplications. Each combinational unit uses dedicated
DSP48 slices to support the floating-point operations for
reconstruction accuracy, whereas, HLS design directives i.e. *Pipelining* and *Loop-Unrolling* are used to exploit the parallelism between each
loop iteration of the SENSE algorithm.

The
performance of the 32-co-SENSE is compared with two hardware platforms (i.e.
multicore-CPUs^{5} and GPUs^{3}) using 8-channel *in-vivo*
human head data set acquired on 1.5T scanner. The data acquisition details and
hardware platform specifications used in our experiments are given in Table 1
and Table 2 respectively. For a comparison between the 32-co-SENSE and SENSE-reconstruction
using CPU^{5} and GPU^{3}, the computation time and
reconstruction accuracy (in terms of SNR) of the reconstructed images is
evaluated for acceleration factor $$$(A_{f})=2$$$.

[1] Pruessmann, Klaas P., et al. Magnetic resonance in medicine 42(5): 952-962, 1999

[2] Ullah, Irfan, et al. Computers in biology and medicine 95: 1-12, 2018

[3] Shahzad, H., et al. Applied Magnetic Resonance 47(1):: 53-61, 2016

[4] Cong, Jason, et al. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30(4): 473-491, (2011)

[5] Siddiqui, Muhammad Faisal, et al. Magnetic resonance imaging 44: 82-91, 2017

Figure-1
Development flow of 32-bit floating-point FPGA-based coprocessor for SENSE algorithm
using vivado-HLS

Architecture of the proposed 32-bit floating-point coprocessor for SENSE reconstruction (32-co-SENSE) .The proposed system receives uncombined multi-channel folded images and Sensitivity Encoding matrix, in memory models $$$M_{f}$$$ and $$$M_{s}$$$ respectively. Subsequently, four pipelined combinational units i.e. $$$MUL_{MM1}$$$, $$$INV$$$, $$$MUL_{MM2}$$$ and $$$MUL_{MV}$$$ solve least squares problem i.e. $$$ρ=\left[\left(S^{H}\times{S}\right)^{-1}\times{S^{H}}\right]\times{I}$$$ to recover the actual values of pixels in the solution image

Table-1 a) Data acquisition details of 8-channel in-vivo human head data set
; b)
Hardware platform
specifications for CPU, GPU and FPGA-based SENSE reconstruction

Table-3
SENSE
reconstruction time for 8-channel 1.5T in-vivo human head data using $$$A_{f}=2$$$