Benchmarking Large Language Models for Neurological Imaging Interpretation Using a Multiple Sclerosis Lesion Segmentation Dataset