Read more

April 07, 2022
3 min read
Save

Should machine learning be used to assess clinical trial data on older adults with cancer?

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Click here to read the Cover Story, "Geriatric oncology field 'entering into adulthood'"

POINT

Yes.

Machine learning gives us a somewhat different lens for understanding data. It should not be an exclusive approach, but it could augment our understanding of this population of patients.

Erika E. Ramsdale, MD
Erika E. Ramsdale

We can use machine learning in an exploratory way to generate new hypotheses that can then be tested. It is very difficult to get causal information from machine learning. However, patterns and associations in the data may suggest things that we did not know about or see before that can be carried forward for future work. Specifically, we do not necessarily know what is important in terms of predicting how someone will do with cancer treatment. We often rely on our clinical judgement, but that only allows into the model what we think is important. As a result, we may miss some things that truly are important but were not suggested to us by our clinical expertise.

Machine learning can also be used to test study conclusions. It serves as another way to analyze data. We can look at many different models, not simply to choose which is best, but to see if they all suggest the same thing, whether the results are robust and if they are replicable. This provides validity when looking at many different models at once.

In machine learning, there is heavy emphasis on ensemble learning, where we look at a group of models and take the average predictions, which mathematically are better than those of any one model. Machine learning gets at multimodel thinking that can encourage robustness of data.

Erika E. Ramsdale, MD, is associate professor in the department of medicine at University of Rochester Medical Center. She can be reached at erika_ramsdale@urmc.rochester.edu.


COUNTER

It depends.

Machine learning is a branch of computer science that uses a variety of algorithms to automatically learn from data. However, it is but one of many ways to learn from the data. To move the conversation forward, another question may be: "Is machine learning the right tool in our analytic toolbox to evaluate (a specific research question)?"

Jennifer L. Lund, PhD
Jennifer L. Lund

As described by Hernan and colleagues, we can generally organize scientific questions into three categories: description — using data to quantitatively summarize events in the world; prediction — using data to map events or inputs to other events or outputs in the world; and causal inference (counterfactual prediction) — using data to predict events as if the world had been different (counter-to-fact). To date, most machine learning studies have addressed predictive questions, emphasized predictive accuracy and focused on large, high-dimensional data sets, such as having more covariates/features than the number of participants/observations. For example, a recent study of older adults with cancer used geriatric assessment data, along with machine learning algorithms, to predict early death.

Making the leap from prediction to causal inference requires expert knowledge that cannot be ascertained by computer algorithm. Instead, it requires in-depth understanding of causal structures of relationships of interest. For example, when studying the effect of a new cancer treatment vs. standard of care on cancer mortality among older adults, expert knowledge of temporal and confounding relationships between covariates, treatment and outcome is necessary. Increasingly, researchers are using machine learning algorithms for aspects of causal analyses, such as in propensity score estimation to control confounding. Other emerging machine learning approaches include "causal trees" that generate "decision trees" where groupings are based on principled estimates of treatment effects within population subgroups. These approaches may prove useful for geriatric oncology.

Now that the research question is clearly defined, what are some considerations for determining whether machine learning should be used? First, many machine learning algorithms are optimized to deal with high-dimensional data, where standard statistical methods can fail. Second, machine learning algorithms can accommodate interactions and other nonlinearities between covariates that are more difficult using traditional methods. If researchers find themselves in these settings, machine learning may be particularly useful.

However, machine learning algorithms do not perform well when data are imbalanced — or when the outcome of interest is rare. Furthermore, data quality (accuracy) and capture (of the target study population) are critical when using machine learning for clinical decision support. For example, a recent study noted stark racial bias resulting from a machine learning algorithm that allocated fewer resources to Black patients who were ill compared with similar white patients.

There is no doubt that advances in computing power, increased software accessibility and proliferation of large health care data have increased interest and capability in machine learning. Although applications in geriatric oncology are sparse, machine learning may lead to important advances in care. As with any analytic approach, clarifying the research question and addressing required assumptions for valid inference are necessary to ensure that knowledge generated using machine learning methods is ethical and informative.

Jennifer L. Lund, PhD, is associate professor in the department of epidemiology and director of data strategy and education within the cancer information and population health resource at UNC Lineberger Comprehensive Cancer Center. She can be reached at jennifer.lund@unc.edu.