Fact checked byRichard Smith

Read more

July 11, 2024
2 min read
Save

AI model highly accurate in determining malignancy of thyroid nodule images

Fact checked byRichard Smith
You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Key takeaways:

  • An AI model achieved accuracy of 99.71% when assessing malignancy in thyroid fine-needle aspiration image patches.
  • Cytopathologists improved their diagnostic accuracy when referring to the AI model.

An AI model achieved higher accuracy than three cytopathologists for determining malignancy in a large dataset of image patches of thyroid fine-needle aspiration whole-slide images, according to a study published in Thyroid.

In a retrospective diagnostic accuracy study, researchers pretested six AI models to determine which was best at diagnosing a dataset of thyroid nodule images. The top-performing AI model in pretesting, the Inception ResNet v2, went on to achieve 99.72% accuracy during a training dataset, 97.7% accuracy during a validation dataset and 94.87% accuracy in assessing a test dataset.

AI outperforms cytopathologists for diagnosing malignant thyroid nodules.
Data were derived from Lee Y, et al. Thyroid. 2024;doi:10.1089/thy.2023.0384.

“We have successfully developed an AI model that distinguishes malignant papillary thyroid carcinoma from benign lesions using image patches from fine-needle aspiration cytology slides of thyroid nodules,” Hyun Joo Choi, MD, PhD, and Yosep Chong, MD, PhD, both from The Catholic University of Korea School of Medicine, and colleagues wrote. “This study is significant in that it used datasets collected from multiple nationwide institutions, utilized images including multiple z-stacks, showed a high accuracy rate of 99.7%, and helped improve the diagnostic accuracy of expert pathologists.”

Researcher collected whole-slide images of 306 thyroid fine-needle aspirations from 86 institutions in South Korea. AI models were developed using 7,994 malignant image patches and 7,891 benign image patches from the whole-slide images. Six models were pretested using random patches of 78 malignant cases and 88 benign cases. The top-performing model in pretesting was further assessed using 3,797 image patches for training, 565 for validation and 507 for testing. An additional 1,031 image patches were randomly selected to test the diagnostic accuracy of the AI model vs. three experience cytopathologists. The cytopathologists were tested a second time after being able to refer to the AI model results.

During pretesting, the Inception ResNet v2 was the top-performing model with a 97.04% accuracy. The model had a 99.72% accuracy, 99.87% sensitivity and 99.58% specificity during a training dataset. For the validation dataset, the model achieved a 97.7% accuracy, 99.57% sensitivity and 96.37% specificity. In the test dataset, the model had 94.87% accuracy, 100% sensitivity and 90.41% specificity.

In a comparison test with three cytopathologists, Inception ResNet v2 had an accuracy of 99.71%, sensitivity of 99.81% and specificity of 99.61%. It outperformed the cytopathologists, who had an average accuracy of 88.91%, sensitivity of 87.26% and specificity of 90.58%. When the cytopathologists got to refer to the diagnostic results from the AI model, their accuracy increased to 95.76%, sensitivity increased to 95.24% and specificity increased to 96.3%.

“AI can help calibrate pathologists’ diagnoses to increase sensitivity or specificity and reduce inter-observer variation owing to individual pathologist tendencies,” the researchers wrote.

The researchers discussed several limitations in the study, including the lack of a classification for indeterminate nodules and the use of images patches instead of whole-slide images for determining a diagnosis.

“The performance of this AI cannot be guaranteed in situations where more comprehensive judgment is required, and external validation using external datasets, preferably whole-slide images, unrelated to training, is required to prove its usefulness as a diagnostic tool,” the researchers wrote.