ChatGPT shows promise for answering questions related to retinal diseases
Click Here to Manage Email Alerts
AI showed favorable results in generating accurate responses to multiple questions covering diagnostic criteria, treatment guidelines and management strategies for retinal diseases.
While it cannot replace a retina specialist, ChatGPT 4.0 is “very promising and interesting as an adjunct to education as well as practice monitoring and for continuing medical education,” Parnian Arjmand, MD, MSc, FRCSC, of Mississauga Retina Institute in Ontario, said at the American Society of Retina Specialists annual meeting.
The cross-sectional survey study evaluated the ability of ChatGPT 4.0 to provide responses to 130 questions covering medical retina and surgical retina topics in accordance with the American Academy of Ophthalmology’s Preferred Practice Pattern (PPP) guidelines. Three vitreoretinal specialists evaluated the responses based on their accuracy, relevance and adherence to the PPP guidelines using a Likert scale score of 1 to 5, while response readability was evaluated using Flesch readability ease and Flesch-Kincaid grade level scores.
“Before typing the prompt for the question, we would say, ‘I want you to act as an experienced ophthalmologist. Answer the following questions using the most up-to-date medical guidelines for retina specialists,’” Arjmand said.
Overall, ChatGPT 4.0’s responses demonstrated a “very high” level of alignment with PPP guidelines, with a mean score of 4.91 and a median score of 5. The AI responses to questions focusing specifically on nonproliferative and proliferative diabetic retinopathy, posterior vitreous detachment and retinal vein occlusion were determined to be highly relevant and accurate, Arjmand said.
The AI-generated answers had a mean word count of 252 words, with the researchers determining that a high level of education would be required to understand the answers. Ten percent of responses lacked important relevant information for clinical decision-making, and 3.8% of responses contained information determined to be incorrect or outdated.
A post hoc analysis revealed that in the domain of retinal tear, the AI platform performed “significantly worse” compared with other conditions.
“Overall, it had a very average high score for general practice preferred guidelines by the American Academy of Ophthalmology and the retinal domains,” Arjmand said. “It did have a lower average score in surgical retinal conditions compared to medical retinal conditions, perhaps suggesting that surgical retinal disease often requires more elaboration and might differ from case to case.”