AI-Based Tools Haven’t Reached Diagnostic Potential In COVID-19
Researchers from the University of Minnesota Medical School have determined that AI-based tools have not yet reached their full diagnostic potential in COVID-19, as study findings underperformed radiologist prediction. Published by the Journal of Radiology: Artificial Intelligence, the study across 12 hospital systems evaluated the real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 from chest X-rays.
Participants with COVID-19 had a significantly higher COVID-19 diagnostic score than participants who did not have COVID-19. However, researchers found the real-time model performance was unchanged over the 19 weeks of implementation. The model sensitivity was significantly higher in men, while model specificity was significantly higher in women. Sensitivity was significantly higher for Asian and Black participants compared to white participants. The COVID-19 AI diagnostic system had significantly worse accuracy as compared to radiologist predictions.
“This study, which represents the first live investigation of an AI COVID-19 diagnostic model, highlights the potential benefits but also the limitations of AI,” said Christopher Tignanelli, MD, MS, FACS, FAMIA, an associate professor of surgery at the University of Minnesota Medical School and general surgeon at M Health Fairview. “While promising, AI-based tools have not yet reached full diagnostic potential.”
The research findings were informed by an AI algorithm developed by Ju Sun, an assistant professor at the U of M College of Science and Engineering, and his team in collaboration with M Health Fairview and Epic.
- COVID-19 diagnostic models perform well for participants with severe COVID-19 effects; however, they fail to differentiate participants with mild COVID-19 effects.
- Many of the early AI models in the pandemic that were published boasted overly optimistic performance metrics using publicly available datasets.
- The AI model’s diagnostic accuracy was inferior to the predictions made by board-certified radiologists.
“We saw the same overly optimistic performance in this study when we validated against two publicly available datasets; however, as we showed in our manuscript, this does not translate to the real world,” Dr Tignanelli said. “It is imperative moving forward that researchers and journals alike develop standards requiring external or real-time prospective validation for peer-reviewed AI manuscripts.”
Researchers hope to develop a simpler diagnostic AI model by integrating data from more than 40 U.S. and European sites and multi-modal models that leverage structured data and clinical notes along with images.
This study was funded by grants from the National Institutes of Health (NHLBI T32HL07741), as well as the Agency for Healthcare Research and Quality (AHRQ) and Patient-Centered Outcomes Research Institute (PCORI), grant K12HS026379 (CJT) and the National Institutes of Health’s National Center for Advancing Translational Sciences, grants KL2TR002492 (CJT) and UL1TR002494 (EK).