AI model analyzes ‘sequence of medical records’ to predict pancreatic cancer risk
Click Here to Manage Email Alerts
An artificial intelligence model showed potential for pancreatic cancer prediction, using electronic medical records to identify individuals with a 25-fold greater risk for developing the disease within 3 to 36 months, study results showed.
The findings, presented at American Association for Cancer Research Annual Meeting, could support the design of future screening trials of patients at high risk for pancreatic cancer, according to researchers.
“If we can find a way to identify patients with cancer earlier than their usual diagnosis time, it could potentially be very beneficial for the patients and the general public,” Bo Yuan, a PhD candidate at Harvard University, told Healio. “Artificial intelligence (AI) models have been shown powerful and successful in many real-world applications. We, therefore, argue the application of machine-learning methods to massive hospital data sets might be a way to design such predictive tools.”
Background
Yuan and colleagues used EMRs from the Danish National Patient Registry, which includes data of 6.1 million patients treated between 1977 and 2018, to train the AI models. About 24,000 of those patients developed pancreatic cancer.
A range of machine-learning methods underwent testing, from regression to time-series methods. Researchers specifically trained the models on the sequence of diseases in each patient’s clinical history so the models could learn diagnosis patterns most predictive of pancreatic cancer risk.
“We use the analogy that a sequence of medical records is similar to a sentence of English words,” Yuan said. “We all know that there have been many impressive AI models for natural language processing problems, such as translation, text to voice and text analysis. Why can’t we adapt such concept to a biomedical task?”
The researchers tested the ability of the models to predict pancreatic cancer occurrence at time points of 3 to 60 months following risk assessment.
Results
For prediction of cancer development within 36 months, the best model substantially outperformed a model without time information (AUC = 0.88; OR = 47.5 for 20% recall, 159 for 10% recall), even when training did not include disease events that occurred 3 months before diagnosis (AUC = 0.84). Individuals deemed at high risk had a 25 times higher likelihood of developing pancreatic cancer within the 3- to 36-month window than those below the risk threshold.
The researchers used EMRs from Mass General Brigham Health Care System to further validate the results. The model showed similar accuracy; however, applying the methods in a diverse health care system presented unexpected challenges, Yuan told Healio.
“Blind transfer of a model trained in one country to another was not successful. But performance of models independently trained in each country, fortunately, had similarly high levels of performance, despite substantial differences in health care economics and billing and recording practices,” Yuan said. “This underscores the robustness of deep machine-learning methods on complex, large data sets.”
Yuan and colleagues had difficulty identifying precisely which diagnosis patterns predicted pancreatic cancer risk because of the complex nature of the neural network. They did find significant associations of some clinical characteristics — including diabetes, pancreatic and biliary tract diseases, and gastric ulcers — with increased risk.
“In more general terms, these results indicate the strong potential of advanced computational technologies such as AI and deep learning in contributing to solutions to real-world health problems by making increasingly accurate predictions based on each individual’s health and disease history,” he said.
Implications
The work addresses only the first step toward implementation of early pancreatic cancer diagnosis and treatment in clinical practice, according to Yuan. Other steps include detailed screening of high-risk patients and effective treatment after early detection.
“With a reasonably accurate method for predicting cancer risk, one can direct appropriate high-risk patients into clinical screening trials,” he said. “A sufficiently enriched pool of high-risk patients would make detailed screening tests more affordable — as such tests are likely to be prohibitively expensive at a population level — and enhance the positive predictive value of such tests.”
Next steps
“We can now turn these results into a design for clinical screening trials, with software applied to health records of about 1 million patients, identification of those at highest risk, and recruitment into a clinical trial with detailed screening tests for about 200 high-risk patients,” Yuan told Healio. “The particular advantage of this two-step process is that computational screening is very inexpensive, and the more successful the prediction of high risk for cancer, the higher the efficiency and the lower the cost of sophisticated clinical screening and therapeutic intervention programs.”