To develop and evaluate a novel 3D (ViT) framework for accurate MS lesion segmentation that increases cross-center generalization and handles volumetric data, ultimately improving clinical applicability on aggregated multi-center datasets.
Multiple sclerosis (MS) lesion segmentation from MRI is crucial for diagnosis, monitoring, and treatment evaluation. While Vision Transformers (ViTs) have shown promise in 2D segmentation, challenges remain in generalizing across multi-center data due to scanner variability. Extensions to 3D or volumetric data are underexplored, with gaps in capturing lesion evolution and robustness to diverse lesion sizes.
We aggregated a dataset of 136 MS patients from multiple public sources (e.g., MSSEG-2016, ISBI 2015, MSSEG-2), including T1-weighted, T2-weighted, and FLAIR sequences. A 3D Swin Transformer (a ViT variant) was adapted for segmentation, incorporating domain-specific batch normalization for cross-site adaptation. Data augmentation included imaging-aware (e.g., intensity shifts) and lesion-aware (e.g., synthetic lesion insertion) techniques. Models were evaluated on held-out test sets using Dice coefficient, Jaccard index, sensitivity, specificity, precision, Hausdorff distance, true positive rate (TPR), and false positive rate (FPR).
The proposed 3D ViT achieved a mean Dice score of 82.5% (±3.2%), Jaccard of 70.1% (±2.8%), sensitivity of 81.3% (±4.1%), specificity of 99.1% (±0.4%), precision of 76.4% (±3.7%), Hausdorff distance of 2.8 mm, TPR of 75.6% (±5.0%), and FPR of 4.2 lesions/scan. Improvements were notable for small lesions (<5 mm) and cross-center generalization, with only a 2-3% drop in metrics compared to single-center training.
This 3D ViT approach bridges key gaps in MS lesion segmentation by enabling robust analysis on multi-center datasets. Future work includes real-world clinical trials and uncertainty estimation for enhanced interpretability. This method holds potential for integration into neurology workflows, helping MS management.