AI may help create scientific abstracts
Click Here to Manage Email Alerts
Large language models may be helpful in generating scientific abstracts, according to research presented at the American Society of Retina Specialists annual meeting.
Christopher J. Brady, MD, MHS, of the University of Vermont, said there is a lot of hype around the use of large language models (LLMs), especially in medicine.
“At one extreme, there is a hope that one day LLMs may be able to process an entire medical chart and generate an unconsidered hidden diagnosis or a novel therapeutic strategy that the team had not considered yet,” he said. “That being said, the capacity of these systems to hallucinate or generate completely preposterous information is well documented.”
Brady and colleagues conducted a study to determine if a LLM could generate an accurate abstract if it were given the full text of a scientific research article. Brady said this allowed the comparison of accuracy in LLMs vs. author-written abstracts.
A published paper on the OAKS and DERBY trials without the abstract was input into Google Bard, a free version of ChatGPT 3.5 and the paid version of ChatGPT 4. Based on inaccurate abstracts generated by Bard and ChatGPT 3.5, ChatGPT 4 was chosen to be used for the rest of the study.
The researchers then used ChatGPT 4 to generate five abstracts from articles in four ophthalmology journals, for a total of 20 abstracts.
Brady said each abstract was generated in less than 1 minute.
The researchers thought they identified two instances of hallucination on the part of the LLM. However, they later discovered that the discrepancies were related to errors in the author-written abstracts.
There were some trivial typographical errors in the LLM-generated abstracts that were consistent with the full text rather than true hallucinations, Brady said. The LLM-generated abstracts had a lower character count and provided fewer numerical results and statistics of comparison vs. those written by an author but were sometimes better at correctly describing the study design.
The biggest limitation is that AI systems keep changing; since the study was conducted in January, Bard no longer exists and is now Gemini, and while ChatGPT 4 still exists, ChatGPT 4o is the most advanced model, “so these results can change with each additional enhancement,” Brady said.
“We were impressed that ChatGPT 4 was able to process these manuscripts and generate a uniform scientific abstract that looked normal and did not have mistakes,” he said. “We feel that these could immediately prove to be useful tools for authors, peer reviewers and editors to make articles more consistent and more correct.”