AI: The Importance of Diversity in Data

Every AI application feeds on vast amounts of data to produce insights into how to diagnose, treat, or manage patients. In health care, the quality of that data – and the types of data being used to feed the algorithm – are critical. AI has the potential to improve patient outcomes through faster diagnoses and more accurate, targeted treatments.

Dr. Sonia Gupta, Dr. Elizabeth Hawk, and
Dr. Erin Schwartz discuss Diversity in Data at RSNA.

That’s why diversity in data is a growing concern among leading healthcare and AI experts. Sonia Gupta, MD, of Beth Israel Deaconess Medical Center, Boston, and Harvard Medical School, Cambridge, MA, and Elizabeth Hawk, MS, MD, PhD, DABNM, DABR, of Radiology Partners in El Segundo,CA, say the data used to train our healthcare AI systems must become more diverse to avoid bias in the algorithms that could inadvertently harm patients through discrimination.

While AI and machine learning systems aren’t intrinsically biased, the data used to create these algorithms can have built-in bias if it’s not diverse. Data should include a range of gender, ethnic, racial, age, and geographic regions, and it should cover different populations with genetic predispositions to certain diseases, says Dr. Gupta, Director of Ultrasound Services at Beth Israel Deaconess Medical Center in Boston, and Instructor of Radiology at Harvard Medical School in Cambridge, MA.

“When we're gathering data for an algorithm, we really need to make sure we have diverse data. We need to keep an eye on where that information is coming from and what data is being used for algorithm development,” says Dr. Gupta. “Our data needs to reflect a diverse patient population because we want algorithms and AI solutions that can be applied to a broad range of patients.”

This also includes gathering data from patients who have different levels of access to care. “If you're going to make an algorithm that's supposed to detect cancer, it needs to incorporate data from patients who lack screening, as well as from patients who have access to regular screening, to be representative,” she says.

One example of successfully building AI with diverse data comes from Dr. Gupta’s work with, which is developing an algorithm to detect tuberculosis on chest X-rays.

“Tuberculosis presents unique challenges in developing a screening tool that you can apply worldwide,” she says. “ used data from India, Malaysia, Singapore, and the Philippines, where there's a really high incidence of tuberculosis, but it also includes data from the United States, where instances of tuberculosis are much rarer. I think that that's going to have an impact on a global level.”

Building Diverse AI Development Teams

Building diversity into AI starts well before algorithms are trained on datasets, says Dr. Hawk, a Neuroradiologist and Nuclear Medicine Physician at Radiology Partners in El Segundo, CA, and clinical instructor of Nuclear Medicine at Stanford University. She also serves on the national ACR Commission for Women and General Diversity. 

“There is a tradition of lack of underrepresented minorities, not only in radiology but particularly in imaging informatics and the data sciences fields,” says Dr. Hawk. @HawkImaging

She adds that having diverse and inclusive viewpoints within AI development teams is the first step in developing systems that can effectively treat all patients, and it can actually improve the process of development. She’s a member of Radiology Partners’ diversity committee, which identifies challenges to increasing diversity in radiology, as well as how to tackle the issue. .

“When you bring diversity to the table, it brings different lenses of how to creatively approach a problem. And if you lack those lenses in the conversation, you ultimately lack creativity in problem solving, which creates a fundamental problem in how we design our AI solutions,” says Dr. Hawk.

Specialized trainings, such as implicit-bias training, can help these teams face their hidden biases head-on to improve the development process. Building “feedback mechanisms” into the process is another tool to prevent bias and build diversity into emerging technology.

Diversity in Data Starts with Diversity in Medicine

Part of the diversity in data challenge is getting more women and people from underrepresented populations into AI and the imaging sciences. One point of progress is showcasing women in leadership roles.

“Doing something like this is a good first step because we're showing everyone that there are women involved in AI and radiology,” says Dr. Gupta. “People want to see themselves reflected in these roles, and I think this can bring more women into AI and into radiology in general.”

They echoed Dr. Geraldine McGinty’s theory of sponsorship, a critical step in helping women to rise through the ranks in medicine into leadership positions.

“It's really critical that not only do we mentor people, set a good example, and have the conversation, but we actually sponsor people, bring them to the table, and bring the opportunities to them,” says Dr. Hawk.

Dr. Gupta says that she couldn’t have achieved her career goals without mentors and sponsors. She and Dr. Hawk have known each other since they were first-year residents and met at the annual American College of Radiology meeting, and she credits Dr. Hawk with giving her some of her first opportunities in AI and radiology.

“Those mentors and sponsors have been career changing,” she says. “I've been grateful to some of my friends and colleagues who have known my career interests and given me sponsorship opportunities,” she says.

Using Data to Prevent Bias in Health Care

Ultimately, diversity in data is meant to prevent bias in the data, and to avoid bias in health care. Radiologists have great power to make that happen.

“Radiologists need to become leaders in the field and in the application of this new technology, because we have the greatest understanding of how it applies to patient care,” says Dr. Hawk. “I think it's very important that physicians and radiologists take ownership of this data and play a key role in shaping these regulations and how they evolve with the technology moving forward.”

That means getting involved on the ground floor of AI development, like Dr. Gupta did with, and taking a holistic approach to viewing diversity in data.

Editor’s note: Sonia Gupta, MD, of Beth Israel Deaconess Medical Center, Boston, and Harvard Medical School, Cambridge, MA, and Elizabeth Hawk, MS, MD, PhD, DABNM, DABR, of Radiology Partners in El Segundo, CA, are passionate advocates for diversity in data when developing artificial intelligence (AI) applications to improve patient care. Erin Schwartz, MD, FACR, and Editor-in-Chief of Applied Radiology, discussed this topic with the two in an AR Connect Expert Panel Discussion at RSNA.

Back To Top

AI: The Importance of Diversity in Data.  Appl Radiol. 

By McKenna Bryant| March 11, 2020

About the Author

McKenna Bryant

McKenna Bryant

McKenna Bryant is a freelance healthcare writer based in Nashotah, WI.

Copyright © Anderson Publishing 2022