UCLA AI-Model Reaches Clinical Expert Level Accuracy in 3D Images
A deep-learning framework developed by UCLA researchers teaches itself quickly to automatically analyze and diagnose MRIs and other 3D medical images – with accuracy matching that of medical specialists in a fraction of the time. An article describing the work and the system’s capabilities is published in Nature Biomedical Engineering.
Unlike the few other models being developed to analyze 3D images, the new framework has wide adaptability across a variety of imaging modalities. The developers have studied it with 3D retinal scans (optical coherence tomography) for disease risk biomarkers, ultrasound videos for heart function, 3D MRI scans for liver disease severity assessment, and 3D CT for chest nodule malignancy screening. They say it provides a foundation that could prove valuable in numerous other clinical settings as well, and studies are planned.
Artificial neural networks train themselves by performing many repeated calculations and screening extremely large datasets examined and labeled by clinical experts. Unlike standard 2D images that show length and width, 3D imaging technologies add depth, and these “volumetric,” or 3D, images take more time, skill and attention for an expert’s interpretation. For example, a 3D retinal imaging scan may be composed of nearly 100 2D images, requiring a few-minute close inspection by a highly trained clinical specialist to detect delicate disease biomarkers, such as measuring the volume of an anatomical swelling.
“While there are many AI (artificial intelligence) methods for analyzing 2D biomedical imaging data, compiling and annotating large volumetric datasets that would be required for standard 3D models to exhaust AI’s full potential is infeasible with standard resources. Several models exist, but their training efforts typically focus on a single imaging modality and a specific organ or disease,” said Oren Avram, PhD, a postdoctoral researcher at UCLA Computational Medicine and a co-first author of the paper.
The UCLA computer model, called SLIViT, for SLice Integration by Vision Transformer, consists of a unique combination of two artificial intelligence components and a unique learning approach that researchers say allow it to accurately predict disease risk factors from medical scans across multiple volumetric modalities with moderately sized labeled datasets.
“SLIViT overcomes the training dataset size bottleneck by leveraging prior ‘medical knowledge’ from the more accessible 2D domain,” said Berkin Durmus, a UCLA PhD student and co-first author of the article. He and Avram are researchers affiliated with the UCLA Henry Samueli School of Engineering and other UCLA schools and departments.
“We show that SLIViT, despite being a generic model, consistently achieves significantly better performance compared to domain-specific state-of-the-art models. It has clinical applicability potential, matching the accuracy of manual expertise of clinical specialists while reducing time by a factor of 5,000. And unlike other methods, SLIViT is flexible and robust enough to work with clinical datasets that are not always in perfect order,” Durmus said.
Avram said SLIViT’s automated annotation may benefit patients and clinicians by improving diagnostic efficiency and timeliness, and it advances and expedites medical research by reducing data acquisition costs and duration. Additionally, it provides a foundation model to accelerate development of future predictive models.
“What thrilled me most was SLIViT’s remarkable performance under real-life conditions, particularly with low-number training datasets,” said SriniVas R. Sadda, MD, a professor of Ophthalmology at UCLA Health and the Artificial Intelligence & Imaging Research director at the Doheny Eye Institute. “SLIViT thrives with just hundreds – not thousands – of training samples for some tasks, giving it a substantial advantage over other standard 3D-based methods in almost every practical case related to 3D biomedical imaging annotation.”
Eran Halperin, PhD, a professor of Computer Science at the Henry Samueli School of Engineering and Computational Medicine at the UCLA David Geffen School of Medicine, said that even if financial resources were unlimited, ongoing research will always face challenges posed by limited training datasets – in clinical environments, for instance, or when considering emerging biomedical-imaging modalities.
“When a new disease-related risk factor is identified, it can take months to train specialists to accurately annotate the new factor at scale in biomedical images,” he said. “But with a relatively small dataset, which a single trained clinician can annotate in just a few days, SLIViT can dramatically expedite the annotation process for many other non-annotated volumes, achieving performance levels comparable to clinical specialists.”