EHR-based machine learning model yields novel marker for CAD risk, prognosis
Click Here to Manage Email Alerts
An artificial intelligence-derived marker using electronic health records noninvasively quantified plaque burden and mortality risk for adults from two large biobank cohorts, offering an option for more targeted CAD diagnosis, data show.
The study, the first known research to map characteristics of CAD on a spectrum, revealed distinct gradations of disease risk, atherosclerosis and survival that would otherwise be missed with binary case vs. control schemas, researchers wrote in The Lancet.
“CAD and other diseases exist on a spectrum, and each individual will have a mix of risk factors, pathogenic processes and biology changes that determine where they fall on a range of ‘disease-ness,’” Iain S. Forrest, PhD, a postdoctoral fellow and student in the medical scientist training program at the Icahn School of Medicine at Mount Sinai, told Healio. “Yet, the paradigm for most clinicians today is that we break this spectrum into inflexible categories of having disease or not having disease. This results in missed diagnoses, inappropriate management and potentially poor clinical outcomes. We want to see if there is a better way to capture this spectrum of disease, and that is what turned our attention to machine learning.”
In a retrospective, observational study, Forrest and colleagues developed and validated a CAD-predictive machine learning model using 95,935 EHRs and assessed its probabilities as in-silico scores for CAD (ISCAD), ranging from 0 (lowest probability) to 1 (highest probability), in participants from two large-scale, longitudinal biobank cohorts. Within the two cohorts, 35,749 were from the BioMe Biobank used for model training and validation (median age, 61 years; 41% men; 14% with diagnosed CAD) and 60,186 were from the UK Biobank used for external testing (median age, 62 years; 42% men; 14% with diagnosed CAD).
“In both (biobank) cases, we were able to look at their deidentified electronic health records, including medications they take, what diagnoses and codes they have, as well as lab measurements and vital signs,” Forrest said in an interview.
Researchers measured the association of ISCAD with clinical outcomes, including coronary artery stenosis, obstructive CAD, multivessel CAD, all-cause death and CAD sequelae.
In the validation data set, the model predicted CAD with an area under the receiver operating characteristic curve (AUROC) of 0.95 (95% CI, 0.94-0.95), a sensitivity of 0.94 (95% CI, 0.94-0.95) and a specificity of 0.82 (95% CI, 0.81-0.83). The prevalence of CAD was 13% in the validation data set, with a negative predictive value (NPV) of 0.93 (95% CI, 0.93-0.93) and a positive predictive value (PPV) of 0.84 (95% CI, 0.83-0.95).
In the holdout data set, the model predicted CAD with an AUROC of 0.93 (95% CI, 0.92-0.93), a sensitivity of 0.9 (95% CI, 0.89-0.9) and a specificity of 0.88 (95% CI, 0.87-0.88). CAD prevalence was 16% in the holdout data set, with an NPV of 0.89 (95% CI, 0.89-0.89) and a PPV of 0.88 (95% CI, 0.88-0.88).
For the external test data set using UK Biobank data, the model predicted CAD with an AUROC of 0.91 (95% CI, 0.91-0.91), a sensitivity of 0.84 (95% CI, 0.83-0.84) and a specificity of 0.83 (95% CI, 0.82-0.83). CAD prevalence was 14% in the external test data set, with an NPV of 0.84 (95% CI, 0.83-0.84) and a PPV of 0.83 (95% CI, 0.82-0.83).
ISCAD captured CAD risk from known risk factors, pooled cohort equations and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles, including risk for obstructive CAD, multivessel CAD and stenosis of major coronary arteries, according to the researchers.
HRs and prevalence of all-cause death increased in a stepwise manner across ISCAD deciles. Compared with biobank participants in decile 1, the HR for all-cause death for those in decile 6 was 11 (95% CI, 3.9-31) and the prevalence was 3.1%, whereas the HR for all-cause death for those in decile 10 was 56 (95% CI, 20-158) and the prevalence was 11%. Researchers observed a similar trend for recurrent MI.
Additionally, 46% of undiagnosed individuals with high ISCAD ( 0.99) had clinical evidence of CAD according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines.
“What surprised us was how well this digital biomarker selected and captured a lot of the facets of disease, from the buildup of plaque in patient’s arteries to mortality and everything in between, including complications like MI and atrial fibrillation,” Forrest told Healio. “It was reassuring that the model could capture all these diverse facets of the disease.”
Forrest said more research in prospective studies is needed to assess the association of in-silico markers with incident CAD events and death and to examine its efficacy in other populations.
“We focused on CAD in this study as a proof of concept, but we are working to apply this same approach to other common diseases,” Forrest said. “Going forward, we also want to better represent diverse populations, including women and underrepresented ethnicities.”
For more information:
Iain S. Forrest, PhD, can be reached at iain.forrest@icahn.mssm.edu; Twitter: @iainsforrest.