AI Struggles to Accurately Extract Medical Data, Study Finds
A recent study by Columbia University's Mailman School of Public Health has revealed significant limitations in the ability of advanced artificial intelligence to reliably extract information from clinical notes in medical records.
The research focused on using ChatGPT-4, a large language model (LLM), to determine helmet usage among injured bicycle and scooter riders from emergency department admission notes. The study analyzed 54,569 emergency department visits between 2019 and 2022.
Dr. Andrew Rundle, professor of Epidemiology at Columbia Mailman School and senior author of the study, stated, "While we see potential efficiency gains in using the generative AI LLM for information extraction tasks, issues of reliability and hallucinations currently limit its utility."
"When we used highly detailed prompts that included all of the text strings related to helmets, on some days ChatGPT-4 could extract accurate data from the clinical notes. But the time required to define and test all of the text that had to be included in the prompt and ChatGPT-4's inability to replicate its work, day after day, indicates to us that ChatGPT-4 was not yet up to this task."
The AI model struggled to consistently replicate results across multiple trials and had particular difficulty with negated phrases such as "w/o helmet" or "unhelmeted," often misinterpreting them as indications of helmet use.
Researchers found that the AI performed well only when given highly detailed prompts including all relevant text strings. However, the time required to define and test these prompts, coupled with the AI's inconsistency in replicating its work day-to-day, indicated that the technology is not yet suitable for this task.
The study highlights the ongoing challenges in automating the extraction of relevant information from unstructured clinical notes, a process that could greatly benefit medical research and patient care if perfected.
Kathryn Burford, the lead author and post-doctoral fellow at Columbia, emphasized the importance of accessing helmet use information, which is often buried in clinical notes. "Helmet use is a key factor in injury severity," Burford noted, underscoring the significance of reliable data extraction methods in injury prevention research.