A medical transcription tool powered by OpenAI’s Whisper model is widely used in hospitals, but recent findings raise concerns over its tendency to generate inaccurate content, particularly during silent moments.
Whisper, used by Nabla to transcribe medical interactions, has reportedly handled over 7 million conversations across 40 health systems, with more than 30,000 clinicians using the tool. Despite its popularity, Whisper has been shown to “hallucinate” text by adding fabricated information, according to researchers.
The AI model’s hallucinations, discovered by researchers from Cornell University and the University of Washington, involve entire phrases or sentences created during pauses in recorded conversations. In a study analyzing samples from TalkBank’s AphasiaBank, researchers found that Whisper generated inaccuracies in about 1% of transcriptions. Some of these included random or nonsensical phrases, such as “Thank you for watching!” or even statements with violent implications.
Nabla, the company offering this AI-powered transcription tool, acknowledges Whisper’s challenges and is actively addressing the issue, as noted by ABC News. OpenAI, which developed Whisper, responded by stating that it continually works to reduce hallucinations and does not endorse Whisper for high-risk decision-making without oversight.
The Verge contributed to this report.