As ChatGPT evolves, future models may be beneficial for public, clinical use in alopecia
Click Here to Manage Email Alerts
Key takeaways:
- Dermatologists rated ChatGPT’s responses a 4.41 out of 5 for accuracy.
- More responses were considered appropriate for patient interface vs. an EHR response draft (100% vs. 96%).
When asked common questions about alopecia areata, ChatGPT demonstrated high response accuracy for patient interface and moderate to high response accuracy for electronic health record response drafts, according to a study.
“Patients with alopecia areata (AA) turn to many sources to learn about their condition,” Ross O’Hagan, MD, of the department of dermatology at Icahn School of Medicine at Mount Sinai, and colleagues wrote. “Previous work has shown that there are large pools of medical misinformation on social media and obtained via traditional search engines.”
With the advent of ChatGPT, the medical community is faced with even more concerns about the spread of misinformation to patients, especially wither newer iterations of this platform being released currently. In their study, O’Hagan and colleagues investigated the quality of AA information generated from ChatGPT.
The researchers input 25 questions about common patient concerns with alopecia areata in ChatGPT 3.5 and ChatGPT 4.0. The search engines results were evaluated for appropriateness and accuracy by multiple dermatologists in an academic center.
Dermatologists rated ChatGPT’s responses a 4.41 out of 5 for accuracy, with ChatGPT 4.0 outperforming ChatGPT 3.5 (mean score, 4.53 vs. 4.29). For general questions, all responses were considered appropriate for patient interface but only 96% were considered appropriate for an EHR response draft.
For diagnosis, management and psychosocial questions, 100%, 91% and 89% of the ChatGPT responses were considered appropriate for patient interface, respectively, whereas 94%, 79% and 89% were considered appropriate for EHRs.
Ultimately, the authors determined that both ChatGPT models may provide potential utility in answering patient questions.
“While not all answers are accurate and appropriate, and therefore not ready to be employed as a reliable patient resource, it seems likely that there will be progressive adoption of the models in public and clinical use as the models continue to improve,” the authors concluded.