Fast dynamic speech MRI at 3 Tesla using variable density spirals and constrained reconstruction
Sajan Goud Lingala1, Douglas Blake2, Stanley Kruger3, David Meyer4, Eileen Finnegan2, Ingo Titze2, and Eric Hoffman3

1Department of Biomedical Engineering, University of Iowa, Iowa city, IA, United States, 2Department of Communication Sciences and Disorders, University of Iowa, Iowa city, IA, United States, 3Department of Radiology, University of Iowa, Iowa city, IA, United States, 4Janette Ogg Voice Research Center, Shenandoah University, Winchester, VA, United States


We propose an ultra fast dynamic 3 T MRI scheme for imaging the vocal tract dynamics during speech production. Our approach synergistically exploits efficiency of variable density spirals for motion robustness, artifact suppression, and a sparse SENSE based temporal constrained reconstruction scheme. We realize time resolution of upto 6.2 ms/frame and a spatial resolution of 2.4x2.4 mm2. We demonstrate the utility of this scheme in capturing rapidly varying articulators during fast speech stimuli.


Dynamic MRI (DMRI) of vocal tract shaping is emerging as a powerful tool to non-invasively characterize the human voice. However, DMRI has been challenged by intrinsic trade-offs amongst spatio-temporal resolutions, signal to noise, and slice coverage. Various accelerated 1.5 T and 3 T DMRI speech protocols have been proposed to improve image quality. These include the use of dedicated upper-airway coils [1], non-Cartesian acquisitions [2-6], and sparsity [3-5], low rank reconstruction constraints [7]. In this work, we develop a 3 T protocol based on a head and neck coil, variable density spirals, and temporal constrained reconstruction. We show that in comparison to uniform density spirals (UDS), variable density spirals (VDS) can effectively suppress artifacts from coil elements that are sensitive to spatial regions distant from vocal tract region of interest (eg. the head). We combine VDS with a sparse SENSE reconstruction scheme that exploits temporal finite difference sparsity. We demonstrate feasibility of DMRI of fluent speech at up to 6.2 ms/frame at a spatial resolution of 2.4 mm2.


Experiments were performed on a GE-3 T 750 W MRI scanner with a 16 channel head and neck coil. Of the 16 channels, 6 channel elements which were most sensitive to important speech articulators (eg. the lips, tongue, velum, glottis) were used. These elements were also sensitive to other regions distant from the vocal tract such as the head. A free induction decay (FID) based spiral pulse sequence (spatial resolution = 2.4x2.4 mm2, TR = 6.2 ms; TE = 1.5 ms; slice thickness=6 mm; flip angle =150) with golden angle ordering was implemented. VDS and UDS trajectories were designed with gradient specifications of 33 mT/m amplitude and 120 mT/m/ms slew rate. To limit sensitivity to off-resonance, multi-shot short readout lengths were used (Tread=1.9ms; number of interleaves = 21). For UDS, FOV was uniform at 20x20 cm2, while for VDS, FOV was linearly varied from 40x40 cm2 to 8x8 cm2 along the k-space radius (figure 1). We employ a sparse SENSE temporal finite difference constrained reconstruction; the coil maps were estimated by an eigen decomposition method, and the regularization parameter was selected heuristically [4]. Data was acquired from two volunteers (50, 25 year old adult speakers). The speech stimuli consisted of a.) repeating the phrase: “loo-lee-laa-za-na-za”, and b.) counting numbers “one-two-three-four-five” as fast as the speaker could. The rapid number counting task was designed to test the temporal fidelity in capturing vey fast articulatory shaping patterns. Reconstruction was performed at retrospectively chosen time resolutions of 1 TR, 2 TR, and 3 TR per frame. Image quality was qualitatively assessed in terms of preserving the spatio-temporal dynamics.


Figure 2 shows the individual coil images with the UDS and VDS schemes from 21 interleaves (R=1) reconstructed via gridding. Due to higher sampling density in the central k-space regions, the VDS scheme efficiently mitigates artifacts from coil elements sensitive to the regions distant from the vocal tract. Figure 3 shows DMRI images from the speech task of rapidly counting numbers. The reconstructions were performed at various time resolutions of (1 TR/frame, 2 TR/frame, and 3 TR/frame). In this rapid vocal tract shaping speech task, the 3 TR/frame reconstruction showed considerable temporal blur, the 2TR/frame depicted mild temporal blur in some frames. The improved time resolution in the 1 TR/frame reconstructions depicted excellent temporal fidelity at the various articulator levels. As expected, the level of noise was increased by reducing the number of interleaves per frame. Figure 4 (animation) shows the corresponding dynamic video time series. Figure 5 (animation) demonstrates a speaker uttering the phrase loo-lee-la-za-na-za. This task involves production of consonants and vowels at the subjects natural pace, and the reconstruction at a time resolution of 2TR/frame showed good spatio-temporal fidelity. Note that in this subject, there was a subtle signal loss/blurring along the tongue boundary due to off-resonance differences between air and tissue interfaces.


We have shown that in comparison to UDS, VDS sampling design can efficiently suppress artifacts from coil elements that are distant to the vocal tract. At 3 T, we have demonstrated rapid DMRI of vocal tract shaping of up to 6.2 ms/frame by exploiting jointly efficient artifact distribution of VDS along with a sparse temporal constrained reconstruction scheme. With the improved imaging speed, the proposed protocol can be applied to a wide variety of speech tasks. Future work include integration with off-resonance artifact reduction strategies, joint use of outer volume suppression pulses, and extensions of VDS to 3D DMRI.


No acknowledgement found.


[1] Lingala SG, Zhu Y, Kim Y, Toutios A, Narayanan S, Nayak KS. A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn. Reson. Med. [Internet] 2016;00:n/a–n/a. doi: 10.1002/mrm.26090.

[2] Narayanan, S., Nayak, K., Lee, S., Sethy, A., & Byrd, D., An approach to real-time magnetic resonance imaging for speech production. Journal of the acoustical society of America, 115(4), 1771-1776, 2004.

[3] Niebergall A, Zhang S, Kunay E, Keydana G, Job M, Uecker M, Frahm J. Real-time MRI of speaking at a resolution of 33 ms: Undersampled radial FLASH with nonlinear inverse reconstruction. Magn. Reson. Med. 2013;69:477–485.

[4] Lingala SG, Sutton BP, Miquel ME, Nayak KS. Recommendations for real-time speech MRI. J. Magn. Reson. Imaging 2016;43:28–44. doi: 10.1002/jmri.24997.

[5] Iltis PW, Schoonderwaldt E, Zhang S, Frahm J, Altenmüller E. Real-time MRI comparisons of brass players: a methodological pilot study. Hum Move Sci. 2015;42:132–145

[6] Freitas AC, Wylezinska M, Birch MJ, Petersen S, Miquel ME. Comparison of Cartesian and non‐Cartesian real‐time MRI sequences at 1.5T to assess velar motion and velopharyngeal closure during speech. PLoS One 2016;11:e0153322.


Figure 1: FOV versus the normalized k-space radius for the uniform density spiral (UDS) and variable density spiral (VDS) sampling patterns used in this study

Figure 2: Individual coil images from (a) UDS sampling and (b) VDS sampling after gridding reconstruction at R=1. UDS captures spiral artifacts from spatial regions distant to the vocal tract regions of interest such as the head (see yellow arrows). Due to higher sampling density in the central region of the k-space, VDS efficiently reduces these artifacts.

Figure 3: DMRI of fast speech stimuli: The speaker was rapidly counting numbers “one-two-three-four-five”. Sparse SENSE reconstructions are shown at time resolution of (6.2 ms; 12.4 ms; 18.6 ms). Image time series along two cuts are shown in the second and third rows. Note in (a), the superior fidelity in capturing the shaping of all the articulators (see arrows). (b) also depicts good motion fidelity over all; however the motion of the epi-glottis experiences subtle blurring. However with 18 ms/frame, considerable motion blurring is seen in most articulators (see white arrows in (c))

Figure 4: (Animation): DMRI movies of a speaker counting numbers at a rapid pace. Note the improved time resolution of 6ms/frame in (a) to reliably capture the fast temporal transitions. The 12 ms/frame reconstruction in (b) also depicts good fidelity, and only depicts minor motion blurring. The 18ms/frame reconstruction however suffers from severe motion artifacts in frames where the tongue rapidly depicts shape changes.

Figure 5: (Animation): DMRI of a speaker uttering the phrase: 'loo-lee-la-za-na-za" at 12 ms/frame. Note the dynamics of vocal tract shaping during production of interleaved consonant and vowel sounds are well captured. In this speaker, subtle spatial blurring was observed along the tongue boundary, which is attributed to off-resonance induced artifacts along air-tissue boundaries.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)