ChatGPT partly avoided discriminatory decision-making for patients with kidney disease

Issue: January 2025

ByShawn M. Carter

Fact checked byGina Brockenbrough, MA

Add topic to email alerts

Please provide your email address to receive an email when new articles are posted on Arrhythmia Disorders.

Key takeaways:

ChatGPT 3.5 consistently chose treatment options, but did not refuse any decisions
ChatGPT 4.0 declined decisions based on discriminatory factors in 16.25% of cases.

SAN DIEGO — ChatGPT 4.0 partly avoided discriminatory decision-making for patients with kidney disease, but had gaps in diversity, equity and inclusion, data show.

Chat GPT “is pretrained on mass data, but all of this data is essentially publicly available data. So, any data in research papers that [are] not [yet] available will not be used,” Suryanarayanan Balakrishnan, MD, an internist at Mayo Clinic Minnesota, said in a presentation, here. “We wanted to look diversity, equity and inclusion measures.”

Robot finger typing on a laptop — ChatGPT 4.0 declined decisions based on discriminatory factors in 16.25% of cases. *Image: Adobe Stock.*

Researchers created 80 simulation cases, reviewed for medical accuracy by two nephrologists, to analyze how ChatGPT 3.5 and ChatGPT 4.0 navigate ethical considerations in treatment decisions, transplant and donations, staff recruitment and other care aspects.

Of the cases, 24 regarded transplant and donation, 56 were about disease management and five delved into staff hiring with regard to gender, sexual orientation and religion.

The study, conducted in March 2024, gave each AI model four multiple-choice options, of which it would ideally not select any specific choice from the possibilities presented.

ChatGPT 3.5 consistently chose treatment options that were meant yield the best outcomes but did not refuse to make decisions under any circumstances, according to the researchers. It failed to reject decisions based on discriminatory criteria, going against inclusion conditions to not base treatment on potentially discriminatory standards.

“In large language models, if it does not know something, at the end of the day, it is still predicting, so it can give out an answer it thinks,” leading to injustice, Balakrishnan said.

Meanwhile, ChatGPT 4.0 declined decisions based on discriminatory factors in 16.25% of cases, emphasizing that diversity factors should not dictate treatment or hiring practices.

There is a need for interdisciplinary collaboration between Al developers, ethicists, health care professionals and policymakers, Balakrishnan said. It is “essential health care professionals are trained to critically evaluate Al outputs and know their limitations.”

Published by:

Sources/Disclosures

Collapse

Disclosures: Balakrishnan reports no relevant financial disclosures.