Speech recognition and the creation of radiology reports

By Stephen J. Herman, MD, FRCPC
pdf path

Image Gallery

Dr. Herman is an Associate Professor, Toronto General Hospital, University Health Network, Toronto, Ontario, Canada. Dr. Herman is also the Chief Medical Officer of Merge eFilm, Milwaukee, WI.

Portions of the material in this article were presented by Dr. Herman in "Introduction to Speech Recognition" at the 2003 SCAR meeting, Boston, MA.

In recent years, speech recognition (SR) systems have advanced to the point that they are now a practical method of creating radiology reports. More and more departments are beginning to use this technology, and it is increasing in prominence at trade shows, as both a seminar topic as well as a product being marketed.

This article will compare two report creation methods: the traditional method and the SR method. It will also discuss the benefits and problems associated with the use of SR and provide recommendations for using an SR system. The use of SR addressed in this article is the conversion of speech into text, as opposed to the use of SR to control computer applications.

Creation of reports

Traditional method

Traditionally, the process of report creation begins when the radiologist dictates the case, creating an audio report (Figure 1). This is passed on to the transcriptionist, who types the dictated material, creating a preliminary report. Next, the preliminary report is reviewed by the radiologist, who may or may not edit it, and who then accepts the report, which produces the final report that is available for clinicians to review. It is well known that there are often delays-sometimes of several days-from the time the radiologist dictates the case to the time that it gets transcribed. To compensate for this delay, some departments make the audio report available to the clinician. Often, there is also a significant delay between the time that the transcriptionist types the report and the time that the radiologist reviews and accepts it. To compensate for this, many departments allow clinicians to review the preliminary report.

Therefore, the standard method of creating reports is associated with some problems. In addition to the delays in getting the final report, the radiologist may not remember the details of the case when reviewing the preliminary report. Therefore, he or she may need to re-review the images, which may mean pulling them from the film library or recalling them on the picture archiving and communication system (PACS). Also, many radiologists just review the report to check for grammatical and spelling errors. In addition, in some instances, the radiologist actually reports the case twice-the first time creating a quick "verbal" report and later creating what will become the final report (Figure 1).

Finally, the traditional method of report creation requires that the radiology department use a typing pool composed of either employees of the department or an outsourced service.

Report creation with speech recognition

With SR, the radiologist can dictate the case, edit it (if necessary), and accept it all at once, which makes the final report available almost immediately (Figure 2). Therefore, the clinician can review this report sooner than would have been possible in the traditional reporting method. This can potentially lead to better patient care, since the patient can also move on to the next step in their workup or begin treatment sooner. Also, it can lead to a more accurate report because it is completed while the radiologist is reviewing the images, before the details of the case might be forgotten. In addition, there is less chance of the report getting lost in the system. Finally, this process can lead to a more satisfied referring physician, since the report is available more quickly, as well as a more satisfied radiologist, since there is a sense of completion in knowing that the report will not have to be reviewed again.

Benefits of speech recognition

Some of the many benefits of SR have been mentioned above. The two primary benefits are completion of reports more quickly, and reduction in the number of transcriptionists required by the department. These benefits will be described in more detail below. Another benefit is the virtual elimination of calls for preliminary reports. 1 In addition, radiology staff should spend less time looking for film, since the referring physician will have the report more quickly. Finally, SR allows the radiologist more control over the dictation process: eg, because the radiologist doesn't rely on a transcriptionist and doesn't need to create a separate "verbal" report before completing a "final" report later.

Rapid report creation

A number of studies have been performed to assess how much more quickly reports are completed using SR (Table 1). One study found that the mean report turnaround time (ie, from examination completion to report transcription) improved from 87.8 to 43.6 hours. 1 The researchers noted that report availability at 24 hours increased from 10.5% to 62.5%. In another study, report creation time fell from approximately 2 to 4 hours with transcriptionists to <5 minutes with SR. 2 The researchers noted that with time, as the users adapted to the system and vice versa, this time actually fell to <3.5 minutes. Also, they stated that this became more of an issue with the use of PACS, since images were now available for clinicians in 5 to 15 minutes; and since they wanted to include a report with the study, this rapid report availability was mandatory. A third study reported a 10-fold improvement in report turnaround times. 3

Another study investigated the use of these systems in a teaching hospital, where cases are first dictated by residents and then passed to the staff radiologist for final acceptance. 4 The researchers noted that when attending physicians dictated studies themselves, 65% of reports were completed in <15 minutes and 90% in <1 hour. With the traditional use of transcriptionists, the mean report turnaround time was 30 hours. However, when residents reported studies, the final report was available in <1 hour only 15% of the time; 90% of reports were available within 5 hours. Clearly, the delay between the time that the resident completed the report and the time that the attending radiologist signed off on it was significant.

It should be noted that the rapid report completion with SR has also been documented outside of radiology. For example, a study of reporting in the emergency department noted a report creation time improvement from a mean of 39.6 minutes with transcriptionists to 3.65 minutes with SR. 5

Free transcriptionists

Speech recognition systems appear to reduce the number of transcriptionists used by the department. This fact has been documented by two studies that showed that departments do save money after implementing these systems. The first study describes savings of $100,000. 6 The second study stated that their department saved $1.7 million in the first 5 years. 7

Problems with speech recognition

There are two main problems associated with the use of SR systems: accuracy is lower than that of transcriptionists, and radiologists spend more time creating reports. These two issues will be discussed below.


The accuracy of SR systems has been addressed in many studies. These reveal, in general, accuracies in the 90% to 100% range. Analyzing all dictated words, three studies found accuracies of 93% to 97%, 1 95% to 100%, 4 and approximately 90%, 8 respectively.

In a study involving multiple speakers of different nationalities, it was noted that the accuracy rate of native English speakers (90.3%) was slightly higher than that of non-native English speakers (88.4%). 8 There were no gender differences in accuracy rates, nor were there any differences among the various imaging modalities.

The authors further analyzed the errors that the system made according to different criteria. They separated errors that were clinically significant from those that were not. For example, if the speaker dictated femur but the system typed finger , this was clinically significant. If the speaker dictated a but the system typed an , this was not. Also, they specifically looked for clinically significant errors that tended to be difficult to detect. For example, if the speaker dictated "There was no evidence of a pneumothorax ," but the system typed "There was evidence of a pneumothorax, " this was called a significant subtle error. They noted an overall error rate of 10.3% (ie, accuracy of approximately 90%). The clinically significant error rate was 7.8%, and the significant subtle error rate was 1.2%.

Again, similar findings have been noted outside of radiology. For example, in an emergency department study, accuracy rate of SR reporting was found to be 98.5%, compared with 99.7% using transcriptionists. 5 They noted that they were making 2.5 corrections per chart with SR reporting, as opposed to 1.2 corrections with the use of transcriptionists.

Radiologist time

A major negative aspect of SR systems is the fact that some work traditionally done by transcriptionists has been shifted to the radiologist. There are a number of reasons why this is considered a negative aspect. Radiologists don't want to be editors. Since SR systems are generally less accurate than transcriptionists, more editing needs to be done using SR systems. Also, radiologists need to read their reports more meticulously than usual, especially considering the subtle mistakes that occur (as noted above). Traditionally, many radiologists read only certain sections carefully (eg, the impression), but when using SR, they must read the entire report. Finally, when dictating, radiologists need to be careful about so-called dysfluencies (for example, stammering or slurring, or speaking such sounds as um or uh). 1 Some of the newer SR systems can be trained to ignore the latter sounds, however.

Some authors noted that the above points are actually more significant than they first appear. 9 For example, they imply that since the radiologist will now spend more time editing reports than previously, they will spend relatively less time looking at images. In addition, there is a subtle change in focus in the radiologist's mind from image interpretation to thinking about how the SR system performed. 9

Two studies have investigated the specific increase in the time the radiologist spends creating reports. In one study, the average report creation time was 74 seconds using transcriptionists but increased to 162 seconds with SR. 6 It was remarked that this increase in time led to a loss of staff morale. However, it appeared that the SR system used in this study had many problems with it and, therefore, these results are likely not predictive of what can be expected from current systems. For example, they described that there were many system crashes that required the user to reboot the computer, that it took a long time for the system to save files, and that many words required individual training. These delays were factored in to the calculated SR time.

In another study, dictation time increased from 180 seconds to 203 seconds using SR (Table 2). 10 Editing time increased from 146 seconds to 176 seconds. Therefore, total report creation time increased from 326 seconds using transcriptionists to 379 seconds using SR (Figure 2).

Presumably to compensate for this increased time, some authors have noted that radiologists tended to shorten their reports. 4 In one study, the mean report length was noted to decrease from 95 to 60 words when using SR. 1

It is important to keep in mind the perspective of others with respect to this point. Clearly, radiologists are concerned about this extra work. However, radiology administrators may not see the problem in the same way. For example, they may believe that this shifting of the editing function to earlier in the process is "an efficient reallocation of total work rather than additional work." 11


Based on the experience of the many departments using SR, these systems are definitely usable now and worthy of consideration by almost any department. Although this article will not address costs of the systems, anyone considering a purchase should perform a cost/benefit analysis for a specific department.

In the planning stages, it is important to include all of the pertinent stakeholders, including: radiology business managers, information technology personnel, and representative radiologists. It is important that the department has a strong chief and that he or she is a clear believer in the benefits of using SR. The technical aspects of the system must be optimized. For example, the network bandwidth must be adequate and the PCs must meet the requirements of the system vendor. There must be integration with the department radiology information system and ideally with the PACS as well.

The SR implementation will be much more likely to be successful if the users (ie, the radiologists) are motivated to make it a success. In this regard, it will help if they perceive that they are being rewarded in some way for using it. For example, radiologists may feel rewarded if their business saves money (depending, of course, on their financial arrangement with the department). They may feel a sense of satisfaction if they are clearly shown that they are providing better service to their referring physicians. The radiologists need to be reminded of the new sense of completion they now have after dictating and accepting the report in one step, knowing they will not need to see the report again.

It would be helpful if implementation could begin in an area of the department where specific individuals with positive attitudes work. This would increase the chances of a successful rollout. Also, these radiologists would become "champions" for the rest of the department. Each user needs to be provided with adequate training in the use of the SR system. Even more important, adequate support must be available at a moment's notice. In addition, each radiologist should enroll in the system. Enrollment refers to the user training the system to his or her specific voice, prior to using the system in clinical practice. This will improve recognition accuracy, sometimes very significantly, and will improve overall user satisfaction. Users should be trained to speak approximately 10% more slowly than they normally do, as this can improve the recognition rate. Also, each user should be shown how to use macros and templates efficiently to improve report creation time.

When starting to use the system, it is important that radiologists not be in a stressful situation so they can take their time getting familiar with it. Therefore, it is highly recommended that they be relieved of their normal clinical duties during their first day or two of use. For example, if they normally would be expected to dictate 20 CT reports in the morning, they should give 15 of these to colleagues and have only 5 to report themselves. This would allow them to take their time and not feel the pressure to complete their studies.

Finally, when the system is rolled out, it must be made clear to the radiologists that there is no going back to transcriptionists. Otherwise, they will not dedicate themselves to making the system work and will never become familiar enough with it to gain the required sense of comfort.


Current speech recognition systems are now viable options for the majority of users, in terms of both accuracy and user-friendliness. The benefits of improved report turnaround time and cost savings must be weighed against the increased time radiologists must spend editing reports.

Back To Top

Speech recognition and the creation of radiology reports.  Appl Radiol. 

May 06, 2004

Copyright © Anderson Publishing 2022