ChatGPT correctly answered 54.3% of questions on the Orthopaedic In-Training Examination, a grade that corresponds with those from a postgraduate year-1 orthopedic resident, according to published results.
Researchers at Prisma Health-Midlands University of South Carolina School of Medicine analyzed the performance of ChatGPT version 3.5 and version 4 on the 2020, 2021 and 2022 Orthopaedic In-Training Examination with zero prompting. They compared the percentage of correct responses from ChatGPT with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. Additionally, ChatGPT was required to provide a journal article, book or website as a verified source for its answer.
ChatGPT 3.5 answered 196 of 360 questions (54.3%) correctly, a grade that corresponds with averages from PGY-1 residents, and cited a verifiable source on 47.2% of questions with an average median journal impact factor of 5.4.
ChatGPT 4 answered 265 of 360 questions correctly (73.6%), a grade that corresponds with averages from PGY-5 residents and exceeds the passing score of 67% for the American Board of Orthopaedic Surgery part I examination. ChatGPT 4 cited a verifiable source on 87.9% of questions with an average median journal impact factor of 5.2.
“Owing to the rapidly changing standard set by [artificial intelligence], it is important for orthopedic surgeons to be involved in the integration of [artificial intelligence] into this field and to guide it to a position where it can be used in providing excellent patient care,” the researchers wrote in the study.
“ChatGPT demonstrated comparable knowledge with that of orthopedic residents, and with further advancement, may possibly be used in orthopedic medical education, patient education and clinical decision-making,” they concluded.