AI has ‘promising utility’ to give high-quality, empathetic answers to patient questions
Click Here to Manage Email Alerts
Key takeaways:
- Multiple chatbots had higher quality and empathy scores for responses to cancer questions compared with clinicians.
- Patients and clinicians can contribute to the fine-tuning of responses for future clinical use.
AI could have the ability to answer some patient questions on cancer instead of clinicians, reducing clinician burdens and improving access to care, according to study results published in JAMA Oncology.
Multiple chatbots produced responses to a 200-question trial that had higher empathy and quality scores than physician responses, and one chatbot had superior readability as well.
“We believe that chatbots pose the promising potential to draft template responses for clinician review to patient questions,” David Chen, BMSc, medical student at University of Toronto, told Healio. “However, we remain cautious about the need for clinician oversight to ensure medical accuracy and alignment with humanistic elements of physician-patient relationships, such as building trust and rapport.”
Background and methodology
Digital solutions have shown potential to decrease costs and clinician burnout while improving workflow, patient outcomes and quality of life, according to background information provided by researchers.
AI, for example, has demonstrated the capacity to “educate patients with cancer about various aspects of clinical diagnostics and treatment approaches,” they wrote.
Chatbot responses produced more empathetic responses than physicians to general medicine questions in online forums, according to a results published in 2023 by Ayers and colleagues in JAMA Internal Medicine.
“Given the widespread popularity of AI chatbots and emergent applications of these chatbots in clinical environments, we felt that it was important to evaluate the competency of chatbots in a more realistic clinical scenario where patients present with a question about their condition,” Chen said.
Researchers collected 200 random cancer-related questions posted on Reddit r/AskDocs between Jan. 1, 2018, and May 31, 2023, and had three chatbots generate answers.
Study investigators specified responses should be limited to the mean physician answer of 125 words.
They used multiple indices to measure readability, and attending physicians rated overall quality, empathy and readability.
Tabulated scores ranged on a scale of 1 to 5, with 1 representing a “very poor” response and 5 signifying a “very good” reply.
Results and next steps
All three chatbots had higher mean scores for response quality, based on medical accuracy, completeness and focus, as well as overall quality and empathy.
The top performing chatbot (chatbot 3) had higher mean scores than clinicians in all three variables measured: quality (chatbot 3 = 3.56; 95% CI, 3.48-3.63; clinicians = 3; 95% CI, 2.91-3.09), empathy (chatbot 3 = 3.62; 95% CI, 3.53-3.7; clinicians = 2.43; 95% CI, 2.32-2.53) and readability (chatbot 3 = 3.79; 95% CI, 3.72-3.87; clinicians = 3.07; 95% CI, 3-3.15).
Clinicians had better readability than the other two chatbots.
Chatbot 3 had higher mean word count (198.16) than the other two chatbots (135.87; 140.15) and clinicians.
“We were initially surprised at the positive performance of the tested chatbots given their lack of purpose-built design for medical question-answer scenarios, suggesting that these general-purpose, foundational AI chatbots harbor promising utility in specialized medical scenarios,” Chen said.
Chatbot success does not mean they should be implemented without supervision, he added..
“Doctors remain in charge of chatbot oversight to ensure that chatbot responses are medically accurate,” Chen said. “The possible future implementation of chatbots to draft template responses to patient questions about cancer can help reduce physician burnout, so that physicians spend more quality, face-to-face time with patients rather than administrative clinical work such as drafting responses to patients.”
He added both clinicians and patients could take a role in developing a superior chatbot for this setting.
“The role of the physician in designing clinical chatbots can involve, one, designing heuristic rules that can be implemented to safeguard large language model development, and two, labeling data or generating example data necessary to fine-tune language models for specialized tasks,” Chen said. “The role of the patient in model development is to provide user feedback to better align conversational chatbots with competencies such as building empathy and rapport.”
Study limitations included modeling text exchanges from online forums to clinician-patient communications, and not having a patient’s viewpoint of empathy.
Chen believes future research should investigate chatbot utility in realistic clinical scenarios and their ability in conversational situations, as well as randomized trials to provide better data on their value vs. clinician responses.
“We highlight the potential pitfalls of modern chatbots that demand clinical oversight by a supervisory physician,” Chen said. “For instance, common pitfalls that require further research may include addressing medical hallucinations, lack of transparency in research design of chatbot studies necessary for reproducibility, poor empathy, poor engagement by necessary stakeholders including clinicians and patients, training of clinicians to leverage chatbot technologies in their clinical practice, and risks [for] systematic bias of chatbots due to lack of diverse training representation and human alignment.
“We are excited about the advent of clinical chatbot technologies and remain hopeful that future research will address these limitations.”
References:
- Ayers JW, et al. JAMA Intern Med. 2023;doi:10.1001/jamainternmed.2023.1838.
- Chen D, et al. JAMA Oncol. 2024;doi:10.1001/jamaoncol.2024.0836.
For more information:
David Chen, BMSc, can be reached at davidc.chen@mail.utoronto.ca.