February 26, 2024
1 min read
Save

ChatGPT fairly reflects anaphylaxis guidelines, struggles with fabricated sources

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Key takeaways:

  • Information on five domains of anaphylaxis guidelines from ChatGPT3.5 resulted in a DISCERN score of 3.44 out of 5.
  • However, 58% of the citations were dead links, fabricated or miscited.

WASHINGTON — ChatGPT provided decent quality information on current anaphylaxis guidelines, but also had errors in more than half of its sources, research here showed.

“So many people are using ChatGPT to help them answer questions about different topics in all scopes, so we were curious what information it would provide in terms of anaphylaxis itself, how to treat it and how prevalent it is,” Natalie Trotto, DO, an allergy and immunology fellow at Nicklaus Children’s Hospital, told Healio.

Atopic dermatitis on hands
Information on five domains of anaphylaxis guidelines from ChatGPT3.5 resulted in a DISCERN score of 3.44 out of 5. Image: Adobe Stock

In the study — presented at the American Academy of Allergy, Asthma & Immunology (AAAAI) Annual Meeting — Trotto and colleagues assessed the reliability and quality of medical information given by ChatGPT3.5 on five domains of the current AAAAI/American College of Allergy, Asthma and Immunology anaphylaxis guidelines, which include:

  • prevalence;
  • symptoms;
  • diagnostic tests;
  • management; and
  • prevention.

The medical information was measured with the DISCERN questionnaire instrument, whereas agreement between the guidelines and artificial intelligence (AI) answers was measured by five reviewers independently “to weigh the internal consistency of ChatGPT,” the researchers wrote.

They found that the information ChatGPT provided across the five domains produced an average DISCERN score of 3.44 out of 5, “indicating fair-good quality information, with 83% agreement with guideline recommendations.”

The Fleiss kappa score, meanwhile, was 0.6, which equaled moderate agreement between the reviewers.

However, just 42% of the sources provided by ChatGPT were accurate, with the remaining citations made up of:

  • miscited sources (27%);
  • dead webpages (15%); and
  • fabricated links (15%), which was the most concerning finding by the researchers.

“So, while [ChatGPT] might have generally given the right information, it’s questionable as to where it came from,” Trotto explained.

For future research, “we did this towards the beginning of when ChatGPT and these AI sources came out, so it’ll be interesting to see if this changes over time, if AI becomes more sophisticated in ascertaining the strength of the source,” Trotto said.

Reference: