Read more

August 24, 2023
2 min read
Save

Q&A: GPT-4 may improve diagnostic accuracy, confidence in complex cases

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Key takeaways:

  • GPT-4 had a diagnostic accuracy of 66.7% and 83.3% for primary and differential diagnoses, respectively.
  • However, GPT-4 required comprehensive data for effective diagnoses.

GPT-4 performed with a greater accuracy than clinicians and a decision-support system when diagnosing complex cases, a recent study in JAMA Network Open found.

Yat-Fung Shea, MBBS, from the department of medicine at the University of Hong Kong, and colleagues hypothesized that artificial intelligence could help improve clinician diagnosis “by supplying the most probable diagnosis or suggesting differential diagnoses in complex cases.”

PC0823Shea_Graphic_01_WEB
Data derived from: Shea Y, et al. JAMA Netw Open. 2023;doi:10.1001/jamanetworkopen.2023.25000.

To test their theory, the researchers compared the accuracy of diagnoses of GPT-4, clinicians and Isabel DDx Companion, a medical diagnostic decision support system, for six patients aged 65 years and older who had a delayed diagnosis lasting more than 1 month in 2022.

Shea and colleagues found that the diagnostic accuracy for primary diagnoses was:

  • 66.7% for GPT-4;
  • 33.3% for clinicians; and
  • 0% for Isabel DDx Companion.

When differential diagnoses were included, the diagnostic accuracy was:

  • 83.3% for GPT-4;
  • 50% for clinicians; and
  • 33.3% for Isabel DDx Companion.

The researchers concluded that GPT-4 has the potential to increase confidence in making diagnoses, initiate earlier treatment and alert clinicians of potentially missed diagnoses among older patients; however, the tool required comprehensive clinical and demographic information.

Healio spoke with Shea to learn more about the GPT-4’s performance, the clinical implications for primary care physicians and more.

Healio: Were you surprised that GPT-4 diagnosed more accurately than clinicians and Isabel DDx, or was that expected?

Shea: We were surprised. Before the current study, we had explored whether GPT-4 performed better in terms of diagnostic accuracy among patients with various types of cognitive disorders in a memory clinic. We found that GPT-4 is not good in analyzing medical notes related to cognitive impairment.

We would like to test histories related to other subspecialties. To our surprise, GPT-4 could pick up subtle clinical features that may have been missed by clinicians, eg, previous imaging findings of lymphadenopathy or metronidazole side effects.

Although we are not surprised GPT-4 performs better than Isabel DDx, because it’s a program from my days as a medical student (more than 10 years ago), depending on your entry of certain clinical features, it was not able to analyze the medical histories or investigations by itself.

Healio: Do you think the results would have been different with a larger sample size?

Shea: We still need to test the diagnostic accuracy of GPT-4 in various subspecialties with larger sample sizes in order to fully understand its power.

Healio: What are the clinical implications for PCPs?

Shea: GPT-4 may potentially be used as assistance to PCPs, ie, to use GPT-4 as a consultant for opinions for various subspecialties, especially when certain subspecialties may not be readily available in certain resource-limited areas.

Healio: Where does research go from here?

Shea: Because it is difficult to get a “bank” of diagnostically challenging patients to test GPT-4, we are trying to use published case reports of various subspecialties. We are particularly interested in the following areas: infectious disease, rheumatology and adverse drug reactions. We also would like to confirm the lower accuracy of GPT-4 in analyzing medical histories related to cognitive impairment by using published case reports.

At the moment, prospective studies or randomized controlled trials for GPT-4 may have some ethical concerns because suggestions by GPT-4 may be inappropriate or invasive. But at some point in the future, we may be able to do these to analyze whether GPT-4 may shorten the length of stay or time to correct diagnosis.

Healio: Anything else to add?

Shea: There are limitations to GPT-4. There is an inability to detect multifocal infection. The accuracy of GPT-4 still depends on your entry, which should be comprehensive according to the current study.

References: