AI Deep Learning Model Boosts Accuracy in Lung Nodule Risk Assessment
A new deep learning tool shows promise in addressing one of lung cancer screening’s biggest challenges: distinguishing malignant nodules from benign ones without overwhelming patients and health systems with false positives. Results from the multicenter validation study were published in Radiology, the journal of the Radiological Society of North America (RSNA).
Lung cancer continues to claim more lives globally than any other cancer, and screening high-risk groups with low-dose chest CT has proven effective in lowering mortality. But early screening programs also revealed a major drawback—high false-positive rates, which often lead to unnecessary follow-up scans, invasive procedures, and heightened patient anxiety. Pulmonary nodules are common findings on CT, yet determining which will progress to cancer remains a complex task.
To improve risk prediction, researchers at Radboud University Medical Center in the Netherlands developed a deep learning algorithm trained on more than 16,000 nodules from the U.S.-based National Lung Screening Trial. External testing involved three major European studies—the Danish Lung Cancer Screening Trial, the Multicentric Italian Lung Detection trial, and the Dutch–Belgian NELSON trial—together covering over 4,000 participants and nearly 8,000 nodules.
“Deep learning offers promising solutions, but robust validation is essential,” said lead author Noa Antonissen, MD, a PhD candidate at Radboud. “AI accounts for factors that we might not even see on the CT scan to further assess a nodule as likely to be malignant.”
Traditionally, screening programs rely on measurements such as nodule size, type, and growth rate to determine malignancy risk. The widely used PanCan model integrates patient and nodule characteristics into a probability score to guide management decisions. While effective, it still leaves room for misclassification. By contrast, deep learning provides fully data-driven predictions that could capture subtle patterns missed by human-designed models.
The study evaluated the AI’s performance across all nodules, as well as in the particularly tricky indeterminate range (5–15 mm) and among malignant nodules matched in size to benign ones. For cancers diagnosed within one year, the deep learning model achieved an AUC of 0.98, equal to PanCan. But across broader timeframes and more complex subgroups, AI outperformed its comparator. In indeterminate nodules, the AI achieved AUCs of 0.95, 0.94, and 0.90, compared to PanCan’s 0.91, 0.88, and 0.86. In size-matched cases, performance was markedly better—0.79 vs. 0.60.
Perhaps most importantly, the AI tool reduced false positives. At 100% sensitivity for cancers found within a year, it classified 68.1% of benign nodules as low risk, compared to PanCan’s 47.4%. That represents a nearly 40% relative reduction in false positives, which could help make lung cancer screening more sustainable by reducing unnecessary follow-ups.
“Deep learning algorithms can assist radiologists in deciding whether follow-up imaging is needed, but prospective validation is required to determine the clinical applicability of these tools and to guide their implementation in practice,” Antonissen emphasized. “Reducing false positive results will make lung cancer screening more feasible.”
With large-scale trials confirming its accuracy, the AI approach could represent a major step toward refining lung cancer screening protocols—ensuring more patients are diagnosed early while minimizing the harms of overdiagnosis.