Fact checked byHeather Biele

Read more

May 22, 2023
2 min read
Save

ChatGPT fails ACG tests, ‘should be validated’ before use in gastroenterology

Fact checked byHeather Biele
You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Key takeaways:

  • Two versions of ChatGPT did not pass the 2021 and 2022 ACG self-assessment tests, scoring 65.1% and 62.4%.
  • Researchers do not recommend its use in medical education in gastroenterology.

The artificial intelligence tool ChatGPT failed the ACG multiple-choice, self-assessment test, prompting researchers at the Feinstein Institutes for Medical Research to warn against its use for medical education in gastroenterology.

“ChatGPT does not have an intrinsic understanding of an issue,” Arvind Trindade, MD, regional director of endoscopy at Northwell Health System and associate professor of medicine at the Feinstein Institutes, told Healio. “Its basic function is to predict the next word in a string of text based on available information to produce an expected response, regardless of whether such a response is factually correct or not. Therefore, it can be dangerous regarding medical advice or education, as we have shown in our study.”

“ChatGPT was never intended for medical use but more as a general-purpose tool. If the large language model can be trained using medical resources, it may become a powerful education tool in the future for trainees and patients.” Arvind Trindade, MD

With the emerging use and popularity of AI tools like ChatGPT (OpenAI), a natural language processing model that generates human-like responses based on user prompts, Trindade and colleagues sought to assess its performance on a gastroenterology assessment test.

They used ChatGPT versions 3 and 4 to answer the 2021 and 2022 ACG self-assessment tests, each of which includes 300 questions with real-time feedback. Researchers copied and pasted the exact questions and multiple-choice answers into the two versions of ChatGPT, which generated responses. A score of 70% is needed to pass the assessment.

According to results published in the American Journal of Gastroenterology, ChatGPT answered a total of 455 questions. ChatGPT-3 answered 296 of those questions correctly, with an overall score of 65.1% across the two exams, while ChatGPT-4 answered 284 questions correctly for a score of 62.4%.

Because the tool generates its responses from available information, Trindade noted that it would need access to updated resources such as medical journals or databases or gastroenterology guidelines to be used reliably.

“With directed medical training in gastroenterology, it may be a future tool for education or patient use in gastroenterology, but not currently as it is now,” he said. “Before it can be used in gastroenterology, it should be validated.”

He continued, “ChatGPT was never intended for medical use but more as a general-purpose tool. If the large language model can be trained using medical resources, it may become a powerful education tool in the future for trainees and patients.”