Inappropriate nuclear stress tests difficult to identify
Click Here to Manage Email Alerts
Raters at different levels of training varied considerably in their ability to identify nuclear stress tests with inappropriate indications, according to a new study.
The researchers observed modest inter-rater reliability for the 2009 appropriate use criteria (AUC) for radionuclide imaging and questioned whether the AUC classification system is too complex.
Siqin Ye, MD, MS, from the department of medicine at Columbia University Medical Center, and colleagues conducted the CONCORD study, which investigated the extent to which classification disagreements occur and their impact on the identification of appropriate and inappropriate nuclear stress tests.
They randomly selected 400 patients (mean age, 61.5 years; 54% women) undergoing nuclear stress testing at Columbia University Medical Center. They had raters with different levels of training, including cardiology attending physicians, cardiology fellows, internal medicine hospitalists and internal medicine interns, classify the stress tests for those patients according to the 2009 AUC for radionuclide imaging.
Consensus classification by two cardiologists was identified as the operational gold standard, and the researchers calculated the sensitivity and specificity of individual raters. They evaluated inter-rater reliability of the AUC using Cohen’s kappa statistics for pairs of raters.
Ye and colleagues wrote that the cardiologists rated 64% of the nuclear stress tests as appropriate, 18% as uncertain, 14% as inappropriate and 5% as unable to be classified.
Raters differed
They found that inter-rater reliability for noncardiologist raters was modest (unweighted Cohen’s kappa, 0.51; 95% CI, 0.45-0.55), sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, and specificity ranged from 85% to 97%.
“The substantial disagreements between AUC classifications of different raters despite standardized training highlight potential challenges for using the AUC at point-of-care to guide appropriate test ordering, especially as there is considerable disagreement and variable sensitivity for different raters applying the AUC to identify tests with inappropriate indications,” Ye and colleagues wrote. “Future efforts will need to address the complexity of the AUC classification system, through steps such as consolidation of overlapping indications and further streamlining of the classification process, or through improved decision support.”
Clinical practice implications
In a related editorial, Grace A. Lin, MD, MAS, and Ian S. Harris, MD, wrote that the study illustrates the difficulty of applying AUC to clinical practice. “The challenge is in writing criteria that are simultaneously general enough to account for the complexities of modern medical practice and specific enough to be meaningful,” they wrote. “The results from this article suggest that the AUC are failing in at least one of these regards, resulting in variation of interpretation of the criteria. This presents a significant barrier to widespread, consistent implementation of the AUC in practice.”
Lin and Harris, both from the University of California, San Francisco, suggested that AUC “be part of a broader effort to improve the entire clinical decision-making process and be focused on providing feedback to physicians to ensure that they are providing evidence-based, clinically appropriate care,” citing the American College of Cardiology’s Formation of Optional Cardiovascular Utilization Strategies program.
For more information:
Lin GA. Circ Cardiovasc Qual Outcomes. 2015;doi:10.1161/CIRCOUTCOMES.114.001585.
Ye S. Circ Cardiovasc Qual Outcomes. 2015;doi:10.1161/CIRCOUTCOMES.114.001067.
Disclosure: One researcher reports receiving research grants from GE Healthcare, Philips Healthcare and Spectrum Dynamics. Lin and Harris report no relevant financial disclosures.