ChatGPT performed at postgraduate year-1 level on Orthopaedic In-Training Examination

ByMax R. Wursta

Fact checked byKristen Dowd

Key takeaways:

ChatGPT 3.5 correctly answered 54.3% of questions on the Orthopaedic In-Training Examination, corresponding to a PGY-1 level.
ChatGPT 4 correctly answered 76.6% of questions, corresponding to a PGY-5 level.

Perspective from Jonathan M. Vigdorchik, MD

ChatGPT correctly answered 54.3% of questions on the Orthopaedic In-Training Examination, a grade that corresponds with those from a postgraduate year-1 orthopedic resident, according to published results.

Researchers at Prisma Health-Midlands University of South Carolina School of Medicine analyzed the performance of ChatGPT version 3.5 and version 4 on the 2020, 2021 and 2022 Orthopaedic In-Training Examination with zero prompting. They compared the percentage of correct responses from ChatGPT with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. Additionally, ChatGPT was required to provide a journal article, book or website as a verified source for its answer.

OT0923Kung_Graphic_01 — Data were derived from Kung JE, et al. *JBJS Open Access*. 2023;doi:10.2106/JBJS.OA.23.00056.

ChatGPT 3.5 answered 196 of 360 questions (54.3%) correctly, a grade that corresponds with averages from PGY-1 residents, and cited a verifiable source on 47.2% of questions with an average median journal impact factor of 5.4.

ChatGPT 4 answered 265 of 360 questions correctly (73.6%), a grade that corresponds with averages from PGY-5 residents and exceeds the passing score of 67% for the American Board of Orthopaedic Surgery part I examination. ChatGPT 4 cited a verifiable source on 87.9% of questions with an average median journal impact factor of 5.2.

“Owing to the rapidly changing standard set by [artificial intelligence], it is important for orthopedic surgeons to be involved in the integration of [artificial intelligence] into this field and to guide it to a position where it can be used in providing excellent patient care,” the researchers wrote in the study.

“ChatGPT demonstrated comparable knowledge with that of orthopedic residents, and with further advancement, may possibly be used in orthopedic medical education, patient education and clinical decision-making,” they concluded.

Perspective

Jonathan M. Vigdorchik, MD

As of recent, there has been much public discussion surrounding artificial intelligence (AI), and specifically a software known as ChatGPT (OpenAI). Numerous articles have examined how ChatGPT performs on different licensing exams, such as the United States Medical Licensing Examination or Bar exam, and the results are striking. This particular study looks at how ChatGPT performs on the Orthopaedic In-Training Examination, an exam taken each year by orthopedic residents around the United States. The authors test the performance of GPT-3.5 and the new release of GPT-4 and found performance improved from 54.3% correct (the level of a PGY-1 resident, intern) to 73.6% correct (the level of a PGY-5 resident). Despite the tight standard deviations amongst test-takers, this represented 69 more correct answers of the 360 total questions.

It is amazing that only after a few months, with the release of a new model, the results improved so dramatically. As with most AI models, the pace at which the models learn is accelerating, and we as an orthopedic community need to be prepared to embrace AI and incorporate it into the three pillars we hold most dear: education, research and patient care. We should learn to understand the strengths and limitations of this technology and leverage the ability of AI to bring us into the next generation of orthopedic care for our patients.

Jonathan M. Vigdorchik, MD

Hip and knee surgeon

Hospital for Special Surgery

New York

Disclosures: Vigdorchik reports having equity ownership in OrthoAI.

Published by:

Sources/Disclosures

Collapse

Source:

Kung JE, et al. JB JS Open Access. 2023;doi:10.2106/JBJS.OA.23.00056.

Disclosures: Kung reports no relevant financial disclosures. Please see the study for all other authors’ relevant financial disclosures.

AI in Medicine

Read more

ChatGPT performed at postgraduate year-1 level on Orthopaedic In-Training Examination

Key takeaways:

Perspective

Related Content