Reproducibility of SIENAX volumetric outputs over intra-session, inter-session and inter-scanner acquisitions
Guillem Garcia1, David Moreno-Dominguez1, Matt Rowe1, Vesna Prckovska1, and Paulo Rodrigues1

1QMENTA Inc., Barcelona, Spain


Automatic tissue segmentation tools are common in the neuroimaging field. Evaluating their reliability is necessary to validate the findings of studies that use these tools. We conducted a reliability analysis for SIENAX in a test-retest dataset and a multi-site dataset. The results were analysed and compared with other automatic segmentation tools. The volumetric outputs of SIENAX show low coefficients of variance for the test-retest dataset in both grey matter (1.11%) and white matter (0.69%). In the multi-site data the results were to 3.95% and 6.47% respectively, suggesting a possible need for data harmonization in multi-site studies.


Automatic segmentation tools are a very useful resource in the neuroimaging field, saving time and resources in studies involving quantification of brain tissues. SIENAX1,2 is a tool included in FSL3 that is commonly used in clinical studies4,5,6 that automatically segments white and grey matter and estimates the total brain volume normalized for skull size. It is primordial to assess the reliability of this kind of tools in order to validate their results. We conducted an in-depth analysis using SIENAX. First, we used a test-retest dataset7 to analyze the variance of the tool when segmenting the volumes of subjects in different sessions and compared the results with previous results8 using Freesurfer9 and ANTS10 , two other widely used segmentation tools. Then, we analyzed the variance of SIENAX in a single-subject multi-site dataset11 to assess its reproducibility when segmenting the volumes of images from different MRI scanners.


The first set of data consists of T1-weighted MRI images of 3 subjects that were scanned twice per session for a total of 20 sessions7, the second one contains T1-weighted MRI images of one subject that was scanned in 20 sites across Europe and USA11. SIENAX volumetric was computed for all sessions. To evaluate the test-retest reproducibility we followed the same procedure as Maclaren et al.7, we computed the paired standard deviations using the volumes from the first and second scan of each session and also the total standard deviation. Then we used the values to obtain the coefficients of variance intra-session (CVs) and inter-session (CVt). Additionally, we compared our results with previous work where similar measures were obtained for ANTs and Freesurfer algorithms8.Finally, to estimate the reproducibility of SIENAX in a multi-site setting, we computed the coefficients of variance for the subject across all the sites (cross-site).


Results for the test-retest section are shown in Table 1. SIENAX presents the lowest coefficient of variance when measuring white matter volumes. Regarding grey matter volumes, ANTS appears to be slightly more reproducible with respect to SIENAX (CVt 0.14% lower). Both SIENAX and ANTS perform better than Freesurfer in terms of reproducibility of tissue segmentation, with CV values roughly 50% lower. The coefficients of variance for the multi-site study are in Table 2. The results in the cross-site analysis resulted in values values 4 to 10 times greater than in the test-retest study (see Table 2).


It should be noted that the three automatic segmentation tools do not process the images in the same way. SIENAX’s white and grey matter tissue masks include the cerebellum and brainstem, while Freesurfer and ANTS analyze each of those regions separately. However, the obtained results give us a good idea of the overall reproducibility of SIENAX as well as its performance in comparison with other automatic tools.

Future work will involve increasing the number of subjects with cross-site acquisitions and comparing the tissue segmentation obtained with Freesurfer and ANTS in the multi-site dataset.


We conducted a reproducibility analysis for SIENAX on two different datasets. The results show that SIENAX volumetric outputs haves similar CVs as Freesurfer or ANTS or lower (in the case of white matter) when measuring subjects across sessions. However, in the mult-isite study data, the reproducibility decreases (higher CVs), suggesting that a prepossessing to harmonize the images from different sites could be necessary to improve reliability. This work motivates further research regarding how to harmonize multi-site data given that the results are affected by the type of Scanner and the parameters that are used to acquire the images. By being able to harmonize multi-site data we will ensure that the results are reliable, and enable the possibility of collaborative studies that combine data from different centers.


No acknowledgement found.


  1. Smith SM, De Stefano N, Jenkinson M, Matthews PM. Normalized accurate measurement of longitudinal brain change. Journal of computer assisted tomography. 2001 May 1;25(3):466-75.
  2. Smith SM, Zhang Y, Jenkinson M, Chen J, Matthews PM, Federico A, De Stefano N. Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. Neuroimage. 2002 Sep 1;17(1):479-89.
  3. Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage. 2004 Jan 1;23:S208-19
  4. Anderson VM, Goldstein ME, Kydd RR, Russell BR. Extensive gray matter volume reduction in treatment-resistant schizophrenia. International Journal of Neuropsychopharmacology. 2015 May 1;18(7).
  5. Altmann DR, Jasperse B, Barkhof F, Beckmann K, Filippi M, Kappos LD, Molyneux P, Polman CH, Pozzilli C, Thompson AJ, Wagner K. Sample sizes for brain atrophy outcomes in trials for secondary progressive multiple sclerosis. Neurology. 2009 Feb 17;72(7):595-601.
  6. Novak P, Schmidt R, Kontsekova E, Kovacech B, Smolek T, Katina S, Fialova L, Prcina M, Parrak V, Dal-Bianco P, Brunner M. FUNDAMANT: an interventional 72-week phase 1 follow-up study of AADvac1, an active immunotherapy against tau protein pathology in Alzheimer’s disease. Alzheimer's Research & Therapy. 2018 Dec;10(1):108.
  7. Maclaren J, Han Z, Vos SB, Fischbein N, Bammer R. Reliability of brain volume measurements: A test-retest dataset. Scientific data. 2014 Oct 14;1:140037.
  8. Puch S, Rodrigues P, Moreno-Dominguez D, Ramos M, PrĨkovska V. (free)Surfing ANTs: a comparative study. 10.13140/RG.2.2.34119.19369, 2017
  9. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, Van Der Kouwe A, Killiany R, Kennedy D, Klaveness S, Montillo A. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002 Jan 31;33(3):341-55.
  10. Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage. 2011 Feb 1;54(3):2033-44.
  11. Keshavan A, Paul F, Beyer MK, Zhu AH, Papinutto N, Shinohara RT, Stern W, Amann M, Bakshi R, Bischof A, Carriero A. Power estimation for non-standardized multisite studies. NeuroImage. 2016 Jul 1;134:281-94.


Table 1. Results for the test-retest analysis using the three tools. It shows the coefficients of variance intra-session (CVs) and inter-sessions (CVt) for both grey and white matter measured volumes. It also includes the difference between the two CVs in absolute value.

Table 2. CVs for the multi-site analysis (second column) compared with the test-retest CVt (first column).

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)