Voice recognition: Onward and upward

By Rasu B. Shrestha, MD, MBA, University of Pittsburgh Medical Center, Pittsburgh, PA
pdf path

 Dr. Shrestha is the Vice President of Medical Information Technology, University of Pittsburgh Medical Center, Pittsburgh, PA, and the Medical Director, Interoperability & Imaging Informatics, Pittsburgh, PA.

Radiologists struggled and had fun with voice recognition since long before Siri was born.1 While it’s great that voice recognition seems to be in the mainstream now with it making its way to your car and smartphone, in addition to the call centers and help-desks, radiology vendors are starting to leverage the technology to really catapult the radiology workflow to the next level. The reality of clinical documentation today is that documentation and radiology reporting are more than just about decreased report turnaround time and transcription cost savings.

Voice recognition technology, used the right way, has the potential to really streamline the radiology workflow and make the readflow process more intelligent, meaningful, and accurate. The goal in many instances is to create a cohesive radiology workflow that is more meaningful.

Consumer driven acceptance

This has been a tremendous year for voice recognition in general, and healthcare in particular. We are seeing a rapid adoption of applications for both smartphones and tablets, as well for other everyday devices, such as vehicles that have embraced voice recognition in interesting ways. Over the years, we have seen several companies, including Google, IBM, and Microsoft, develop their own voice recognition technologies. But it was really when Apple infused their marketing wizardry to Siri that things started to take off amongst general consumers. Some see Siri as an entertaining foray into mainstream adoption of voice recognition in the mobile arena. We have started to see a wave of voice-enabled technologies coming into our cars, tempting us to use voice as a primary interface to find the nearest gas pump or to switch audio playlists.

This summer, Nuance introduced Nina,2 a collection of voice-enabled personal assistant technologies that brings voice biometrics, speech recognition, and natural language recognition to the masses. Already, we are starting to see these technologies take us beyond simple speech commands to more natural ‘conversations’ with contextual dialogue. It seems clear that voice recognition is now being led by a new wave of consumer-driven acceptance.

Healthcare penetration

We are starting to see a healthy penetration of voice recognition technology into our workflow in healthcare, and we seem to just be at the beginning of a revolution in clinical document capture and enhanced clinical efficiency enabled by speech. Many physicians have been toying around with out-of-the-box voice recognition software, such as Nuance’s Dragon software, and its more expensive medical edition, to voice enable perhaps a more convenient documentation process on top of their regular electronic medical record (EMR) systems. We are seeing a positive move towards tighter integration with the EMRs, beyond traditional speech-to-text dictation for SOAP (subjective, objective, assessment, plan) notes and basic voice commands to drive specific functions and templates off of the EMR.

Montrue Technologies (Ashland, OR) recently used Nuance’s mobile medical software development kit (SDK) and developed an iPad application that lets physicians dictate notes in a remarkably accurate manner.3

Radiology has been an early adopter of voice recognition in healthcare, albeit sometimes reluctantly. We had started to see the potential to use speech-to-text for our reporting needs even back in the 1990’s.4

The initial impetus was to reduce report turnaround time (TAT), and some reported dramatic report turnaround time improvements from days and hours to minutes.

Continued implementation and greater acceptance of voice recognition technology in radiology then was driven more by cost equations. These technologies eliminated or dramatically decreased transcription costs. But clearly, this in itself was (or is) not a sustainable reason to switch from regular transcription services to voice.

We started seeing academic institutions adopt voice recognition in their reporting workflow, with private practice radiology groups either completely embracing voice or choosing not to touch it at all. Many initially saw the push for voice recognition in radiology reporting as a cost shifting exercise that did not make sense, with expensive radiologists playing the role of transcriptionists.5

The focus driving further adoption soon became the quality of the reports being generated. Initial voice recognition accuracy rates were low compared to today’s rates. The need to redictate words, dates, phrases or entire paragraphs was frustrating. Many radiologists gave up on voice recognition software and opted instead to send all dictations to traditional transcriptionists. This hybrid workflow still has widespread adoption.

However, the accuracy rates of voice recognition technologies have continued to improve over the last decade. In some instances, we are seeing exponential improvements yearly—and this is most evident when the entire speech enablement workflow is taken into consideration, including dramatic improvements in the quality of the speech microphones, background noise reduction algorithms, faster processing capabilities enabled through cloud-based delivery, as well as natural language processing and related technologies that better comprehend the incredible variety of sounds and medical jargon, in widely varying accents and context.

Beyond mere typing with your tongue

While the initial focus of voice recognition applications was primarily around speech-to-text enablement, the current wave of adoption is being driven primarily by intelligence built around the speech driven input. Whether driven by natural language processing technologies or more rudimentary logic around templating and dictation macros, the drive is to enable the clinician to be more efficient, and improve the overall quality of the documents being generated.

Voice-enabled structured reporting6 addresses issues around variations in the content of the reports and allows for a more comprehensible clinical communication. Mammography and cardiology have been at the forefront in structured reporting for a number of years. Use of Breast Imaging Reporting and Data System (BI-RADS) categories in mammography reporting has reduced variability and improved clarity of communication between the radiologists and clinicians. Medical vocabulary and semantics can make or break the natural language processing around creating the radiology report, especially around structured reporting. With the maturity of RadLex, which is a comprehensive radiology lexicon of radiology terms, the process of creating meaningful structured reports can now be put on steroids. RadLex unifies and supplements other lexicons and standards, such as Systematized Nomenclature of medicine Clinical Terms (SNOMED-CT) and Digital Imaging and Communications in Medicine (DICOM).

Aiding radiologists are organized initiatives promoting best practices in radiology reporting. The Radiological Society of North America (RSNA) has established a Radiology Reporting Committee (RSNA Radiology Reporting Initiative), with sponsorship of a forum of radiologists, imaging informatics experts and industry executives promoting the improvement and adoption of standardized report templates, with over 100 best-practices templates currently available freely.

Back to front and center

There is tremendous promise in further defining and leveraging the synergies between voice recognition and neuro-linguistic programming (NLP) technologies. Radiology and hospital information management (HIM) divisions in the provider organizations have been using NLP-driven backend applications to automate analysis of radiology reports and review missed billing opportunities and report quality. However, we are seeing enhanced NLP technologies now coming to the front of the workflow, aiding clinicians in real-time as they create the clinical document. This has the capability of dramatically improving the quality and clinical accuracy of the note or report being generated and streamlining efficiencies in the clinical workflow process. These technologies allow for automated intelligent processes, such as correlation of the recorded structured data items to histopathologic findings.

Enabling better workflow

The end result of every case that a radiologist interprets is, quite basically, a report. Voice recognition and related technologies need to save radiologists time when possible and aid in the workflow. Many radiologists, having been exposed to scars from earlier iterations of voice recognition technologies that were less than optimized to their workflow, are highly sensitized to the introduction of any technology that could possibly distract them from their core mission of caring for their patients and interpreting the imaging studies to the best of their capabilities.7 The radiologist’s workflow, or readflow, is hence a critical consideration in the development or implementation of any voice recognition and related technologies.

Too many healthcare-related applications are designed without consideration for important parameters, such as user-centered design guidelines, usability, automation, hand-eye coordination and radiologists’ flow in reading studies and capturing the data within the report. Any time looking at dropdowns, menu options, and onscreen streaming text transcribed from voice equates to time away from core interpretation processes.

Also essential is related workflow such as Critical Test Results Management (CTRM), which entails both the clinical needs around reporting significant clinical findings and the regulatory needs around ensuring a closed loop and complete communication around these needs.

The sum of all of its parts

Radiologists have to allow for occasional (or constant!) interruptions to their reading workflow for a variety of tasks, such as quick consults, conversations with ordering physicians, discussions with technologists and residents.

A radiologist’s workspace consists not just of the voice recognition reporting system, but also the picture archiving and communication system (PACS), the radiology information system (RIS), and other systems that may be used for 3-dimensional (3D) imaging and advanced visualization as well as perhaps computer-aided detection (CAD).8 Providers purchase one of these systems, and then have to deal with the challenges of getting the integration between the applications right.

One of the single biggest things that could happen in our imaging industry would be for the key PACS and 3D advanced visualization vendors to work directly with the key voice recognition vendors to streamline the workflow processes and integration challenges through and through, without leaving the headaches to busy radiologists and PACS administrators. This should be a defined process that happens with every product version upgrade, on either side.


As voice recognition and related technologies continue to make the strides we are seeing in the industry, it would be prudent to address the needs of the clinical workflow as a unified imaging workspace. Loose interfaces between critical radiology applications should give way to tighter integration and streamlined bidirectional coordination between these traditionally disparate applications. It is important to realize that as much value as any one of these systems may provide to a specific set of needs around that one system, the reality of today’s radiology environment is one that calls for a patient centric workflow around the imaging study being interpreted. The coordination between the many systems that contribute in one way or another to the creation of the radiology report will then get the focus that it deserves.

We have to allow the tremendous innovations we are seeing today around voice recognition and natural language processing to enhance the workflow of the radiologists, and consequently improve the quality of both the radiology reports as well as the services being provided back to the ordering clinicians.


  1. Apple. http://www.apple.com/iphone/features/siri.html. Accessed September 2, 2012.
  2. Nuance. http://www.nuance.com/for-business/by-solution/customer-service-solutions/solutions-services/mobile-customer-service/nina/index.htm. Accessed September 2, 2012.
  3. Phelps B. HIStalk Interviews Brian Phelps, CEO, Montrue Technologies. HIStalk. http://histalk2.com/2012/03/14/histalk-interviews-brian-phelps-ceo-montrue-technologies/. Accessed September 12, 2012.
  4. Mathie AG, Strickland NH. Interpretation of CT scans with PACS image display in stack mode. Radiology. 1977;203:207-209.
  5. Pezzullo JA, Tung GA, Rogg JM. Voice recognition dictation: radiologist as transcriptionist. J Digit Imaging. 2008;21:384-389.
  6. Weiss DL, Langlotz CP. Structured reporting: patient care enhancement or productivity nightmare? Radiology. 2008. 2008 Dec;249(3):739-47.
  7. Langer SG. Radiology speech recogintion: Workflow, integration and productivity issues. Curr Prob Diagnostic Radiology. 2002;95-104.
  8. Pavlicek W, Muhm JR, Collins JM, et al. Quality of service improvements from coupling the digital chest unit with integrated speech recognition, information, and PACS. J Digit Imaging. 1999 Nov;12:191-197.
Back To Top

Voice recognition: Onward and upward.  Appl Radiol. 

September 28, 2012

Copyright © Anderson Publishing 2020