AI Outperforms Clinicians in Imaging Decisions, Study Shows

Published Date: July 29, 2025
By News Release

Large language models (LLMs) may be poised to revolutionize medical imaging by outperforming clinicians in selecting appropriate imaging tests for patients. A new study published in Clinical Imaging reveals that advanced LLMs can not only match but often exceed medical providers in choosing the correct imaging modality based on a variety of clinical scenarios.

Researchers evaluated the capabilities of four LLMs—DeepSeek-V3, ChatGPT-4o, Claude 3 Opus, and Claude 3.5 Sonnet—by testing them on 120 complex, real-world clinical scenarios spanning a broad range of medical fields, including breast, cardiac, neuro, gastrointestinal, musculoskeletal, thoracic, genitourinary, and vascular conditions. Using three different prompts, the study assessed the LLMs’ imaging recommendations against those made by four clinicians (including an emergency physician, cardiologist, internist, and general surgeon) and four radiologists with varying levels of experience.

Across all prompts, the LLMs demonstrated remarkable consistency, offering identical imaging suggestions in each case. Among the models, DeepSeek-R1 emerged as the leader, achieving 98.3% accuracy in guideline-based scenarios from the American College of Radiology Appropriateness Criteria (ACR AC). It also outperformed clinicians and junior radiologists in more nuanced clinical situations, where patients had complex comorbidities and other complicating factors. DeepSeek-R1 was found to rival both junior and senior radiologists under multifactorial, real-world conditions.

“This study represents the first comprehensive evaluation of DeepSeek in radiology, directly comparing multiple LLMs to radiologists with different experience levels under both standardized and multifactorial conditions,” the study authors wrote. “In guideline-driven ACR scenarios, each model paralleled the accuracy of board-certified junior radiologists. When confronted with MPCS marked by comorbidities, polypharmacy, atypical presentations, and evolving laboratory findings, DeepSeek-R1 significantly outperformed clinicians and residents, while still achieving parity with both junior and senior radiologists.”

The findings underscore the transformative potential of LLMs in radiology, not just for improving diagnostic accuracy but also for reducing unnecessary imaging, optimizing radiation exposure, and streamlining resource use. According to the researchers, LLMs are uniquely equipped to handle the complexity of imaging decisions, offering advantages over traditional systems like keyword-based search engines.

“Conventional rules-based, keyword-matching engines (Google, Bing, Yandex, etc.) and simple text parsing techniques that have existed for decades can extract predefined terms from requisition text, but they are inherently brittle,” noted study co-author Eren Çamur of the Ministry of Health Ankara 29 Mayis State Hospital in Turkey. He explained that these older methods falter when faced with medical synonyms, abbreviations, or conflicting clinical data, and they lack the ability to assess risks such as radiation exposure or contrast agent use in a unified way. “LLMs, by contrast, derive a semantic representation of the entire request, permitting contextual reasoning across multiple variables and dynamic alignment with the evidence hierarchy embedded in the American College of Radiology Appropriateness Criteria (ACR AC).”

Ultimately, the researchers concluded that LLMs hold "remarkable potential" to enhance radiology workflows and clinical decision-making. As models like DeepSeek continue to evolve, they could soon become indispensable tools in helping healthcare providers deliver safer, more effective imaging care.