High sensitivity observed for most molecular tests assessing malignancy of thyroid nodules
Click Here to Manage Email Alerts
Studies analyzing four molecular tests commonly used to assess malignancy in thyroid nodules have high sensitivity despite several limitations, according to findings from a systematic review and meta-analysis.
“Meta-analyses of four commonly evaluated and used tests — Afirma gene expression classifiers, Afirma genomic sequencing classifier, ThyroSeq version 1 and version 2, and ThyroSeq version 3 — show high sensitivities and area under the curve measures, which seemingly underscore the suitability and utility of their current use guidelines as a part of thyroid nodule management practices,” Carrie Cunningham Lubitz, MD, MPH, associate professor of surgery at Harvard Medical School and section head of the endocrine surgery unit at Massachusetts General Hospital, and colleagues wrote in a study published in Thyroid. “However, these results must be interpreted in light of high levels of diagnostic review bias and verification bias, in addition to study design limitations.”
Researchers conducted a systematic review and meta-analysis of studies attempting to clinically validate the accuracy of commercially available molecular tests for thyroid nodules. The PubMed, Embase and Web of Science databases were searched until July 29, 2021. Original articles reporting diagnostic results, including counts of true-negative, true-positive, false-negative and false-positive results, were included. Sensitivity and specificity were collected for each study. Patient selection, inconsistent comparison bias, partial verification bias, diagnostic review bias, observer variability, reporting of indeterminate results and institutional malignancy prevalence were categories used to assess risk of bias.
There were 49 studies included in the systematic review and meta-analysis. Afirma gene expression classifiers were analyzed in 35 studies, The Afirma genomic sequencing classifier was analyzed in nine studies and one study analyzed both Afirma tests. Nine studies analyzed either ThyroSeq version 1 or 2, five studies analyzed ThyroSeq version 3, and one study analyzed ThyroSeq version 2 and 3. There were 39 retrospective studies, nine prospective studies and one study combining retrospective and prospective cohorts. Patients were enrolled consecutively in all studies, and 84% of studies enrolled participants regardless of whether they eventually had surgery.
In the meta-analysis, there was significant heterogeneity between studies analyzing Afirma gene expression classifiers (P = .006 for sensitivity, P < .001 for specificity). In a random-effects bivariate model, Afirma gene expression classifier studies had a sensitivity of 92% and a specificity of 26%.
Significant heterogeneity was detected between specificities for the Afirma genomic sequencing classifier, ThyroSeq version 1 and 2, and ThyroSeq version 3 studies (P < .001 for all), but there was no significant heterogeneity for sensitivity. The Afirma genomic sequencing classifier had a sensitivity of 94% and a specificity of 38%. ThyroSeq version 1 and 2 studies had a sensitivity of 86% and a specificity of 74%. ThyroSeq version 3 studies had a sensitivity of 92% and a specificity of 41%.
Researchers noted several limitations were found in the studies, including a lack of correction for partial verification bias accounting for possible false-negative results, a lack of masked histopathologic review in most studies, and inconsistencies on whether to classify noninvasive follicular thyroid neoplasm with papillary-like nuclear features as benign or malignant.
“Our results, in addition to prior systematic reviews of molecular tests for thyroid cancer, bolster the conclusions of the majority of studies reviewed, including the industry-sponsored studies, which are that molecular tests for indeterminate thyroid nodules have the potential to aid in clinical decision-making; however, solidifying this finding warrants further investigations,” the researchers wrote. “For now, given the high level of biases and limitations in the studies evaluated, these results must be interpreted with caution. Future clinical validations of molecular tests must avoid common pitfalls enumerated in this review to evaluate diagnostic molecular tests in a minimally biased manner.”