Read more

June 27, 2024
2 min read
Save

AI comparable to professional translation for patient discharges in some languages

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Key takeaways:

  • Researchers translated discharge instructions through ChatGPT and GoogleTranslate.
  • Patients preferred professional translations of Portuguese and Haitian Creole.
Perspective from Hansa Bhargava, MD

When translating discharge instructions for pediatric conditions, machine translation platforms performed comparably to professional translations for Spanish and Portuguese but not Haitian Creole, a study found.

One of the researchers told Healio that she became interested in the topic through firsthand experience after being born in the United States but growing up in Mexico and feeling “more comfortable speaking Spanish” with patients.

IDC0624Brewster_Graphic_01
Data derived from Brewster R, et al. Pediatrics. 2024;doi:10.1542/peds.2023-065573.

“I was seeing my family and the patients that I care for encounter all of these barriers or others trying to access care,” Priscilla Gonzalez, MD, rising chief resident at Boston Children’s Hospital, told Healio. “A pinpointed discharge process is a very crucial part of a patient's interaction with a health care system, and so crucial in terms of making sure that when they go home, they understand why they were in the hospital, what they need to do afterward and the follow-up process.”

Another researcher said he was interested in exploring ways technology can be used to expand access to health care and improve outcomes in historically marginalized populations.

“I found the idea of machine translation to be a really compelling model to better serve our families who speak languages other than English,” Ryan C.L. Brewster, MD, a resident in pediatrics at Boston Children's Hospital and Boston Medical Center, told Healio. “Translation, I think, is a really direct use case where the performance of — in this case, ChatGPT and Google Translate — would have a direct impact on clinical outcomes, and so we wanted to really provide a benchmark for where we stand with respect to these technologies.”

The researchers translated 20 standardized discharge instructions for pediatric conditions into Spanish, Brazilian Portuguese and Haitian Creole through GoogleTranslate and ChatGPT-4.0. They compared the translations with ones written by professional translation services and evaluated all versions for adequacy (preserved information), fluency (grammatical correctness), meaning (preserved connotation) and severity (clinical harm), along with assessing patients’ preferences.

Ultimately, the researchers found that ChatGPT and Google Translate showed a low rate of clinically significant errors for Spanish and Portuguese, similar to professional translations. For the translations into Haitian Creole, however, ChatGPT (33.3%, P < .001) and Google Translate (23.3%, P = .024) contained more potentially clinically significant errors than professional translations (8.3%).

Gonzalez said they hypothesized that this “would be true to any non-European language” with less of a digital footprint.

Patients also tended to prefer professional Haitian Creole (48.3%) and Portuguese (43.3%) translations but not Spanish (15%).

“While there are some limitations to these machine translation models, we can at least continue to think about them as a way of providing a including and quick translation, especially in smaller clinical settings, or for clinical settings that do not have access to readily available in person interpreters,” Gonzalez said.

Brewster said the findings indicate that caution should be advised when using these machine translation systems in practice.

“We only looked at one narrow dimension of translation performance and are actually working on a follow-up study to be more comprehensive in understanding the accuracy and quality of these of these systems,” Brewster said. “At this point, I think it's still too early to make an endorsement that they be they be used in the clinical setting while we're still trying to understand the guardrail for when it's safe to use them.”