Propensity matching and the GIGO principle

Issue: April 25, 2016

ByDerek Raghavan, MD, PhD, FACP, FRACP, FASCO

It has become much harder to be a rational person with a modicum of biostatistical knowledge who can refrain from becoming irritated in the current “academic” oncology environment.

There is an ever-present need of some centers to publish, with the product predicated on volume rather than quality or impact.

Having spent my life in mainstream academic medicine, I understand assistant professors need to do something to become associate professors, although I have never understood the concept that the weight of the CV or the number of unimpressive or conceptually dull publications should somehow correlate with promotion.

Derek Raghavan, MD, PhD, FACP, FRACP, FASCO — Derek Raghavan

In the current socioeconomic environment, it seems it has become harder to recruit patients to mainstream clinical trials. The reasons are protean and include problems of access, reimbursement and health insurance, geographical and sociocultural barriers, poverty, time, the challenges from the NCI and review bodies for funding and implementation, complexity of design and eligibility criteria, etc.

As a result, it seems our young assistant professors are seeking other ways of making progress and building their CVs and have stumbled upon “big data.” As with most vogue topics, we are in the phase wherein big data rule, largely based on the concept that big is beautiful — and somehow adds statistical impact per se!

Our young (and old) colleagues are trying desperately to answer big questions using stores of information that are available from a bunch of sources — including SEER, National Cancer Data Base and many others — when they are unable to create or implement classical level-one trial designs. This is potentially meritorious work, but important caveats need to be identified.

Caveats of big data analyses

Firstly, and probably most importantly, one needs to be sure about the quality of data, and most particularly who enters the information, whether it is independently checked for accuracy and what potential case — or other — selection biases are embedded therein.

Even if one assumes that big datasets are sacrosanct — I don’t personally believe this — and reflect high-quality information that is fully accurate, there are a bunch of potentially important prognostic variables that may dramatically influence outcome, the interpretation of outcome or the ability to translate the findings to the population at large.

One of the recently introduced tools that is supposed to overcome the biases and inaccuracies of large databases is propensity matching, and we frequently see this recorded in the pages of HemOnc Today. This is a statistical ploy that is intended to “balance” the populations of cases under study.

Thus, for example, if one wishes to compare in a large database the outcomes of patients treated by surgery vs. radiotherapy, or observation vs. active treatment, and one is cognizant of the fact that fitter and more robust patients usually are selected for surgical approaches or active treatment, one can balance characteristics like age, sex, recorded performance status, socioeconomic status and comorbidity via propensity matching to try to homogenize two population groups and make them more comparable.

The problem is that propensity matching only manages to reduce the differences between the two groups. It does not create statistical or clinical parity. Although much harder to achieve in a population context, randomization usually reduces the variation between the two population groups, at least on the basis of statistical chance, especially when larger sample sizes are implicit in design.

Unfortunately, propensity matching, even with large studies, has no logical or statistical basis for overcoming clinician selection biases — which are almost always unstated in medical records or tumor registry information — educational and financial status, gene mutation differences, and a host of other potentially important prognostic variables.

For example, at the Society of Surgical Oncology Annual Cancer Symposium, a formal review of the state database in Tennessee showed routine misrecording of the nature and extent of thyroid surgery that could have dreadfully biased the post-hoc reporting of surgical outcomes in that state. Propensity matching simply could not have had any reliable impact on fixing that issue.

PAGE BREAK

Thus, it seems the study of large databases with propensity matching and other manipulations of data can produce information that may usefully produce hypothesis generation but is certainly no replacement for level-one data.

Statistics 101

Gradually, in the current era, we seem to be building a list of data manipulations that may not elucidate but actually obfuscate. You can add your favorite items to my list, which includes:

The use of HRs absent any serious comment regarding the impact of the units measured. An HR of 0.8 is important when the units are years, and trivial when the units are days to weeks;

Abuse of the forest plot, including data dredging and recycling in meta-analyses that are not limited to primary datasets and attribution of statistically significant P values that ignore the absolute index numbers (namely, days vs. months or years);

Abuse of swimmer plots. Playing with the X-axis and selecting only those cases that are “evaluable” can play havoc with interpretation;

Waterfall plots. Why is it that the numbers of upward and downward lines so rarely tally with the actual numbers of patients treated on study? This is mostly due to researchers omitting from their plots the presence of unevaluated or discontinued cases — both potentially failures of treatment until otherwise proven, and thus biasing interpretation; and

Propensity matching in large databases as a replacement for randomized clinical trials.

These concepts seem to represent statistics 101. Why is it that so few selection committees for scientific meetings, journal editors and their reviewers seem to understand these very simple principles?

Reference:

Kiernan CM, et al. Abstract 5. Presented at: Society of Surgical Oncology Annual Cancer Symposium; March 2-5, 2016; Boston.

For more information:

Derek Raghavan, MD, PhD, FACP, FRACP, FASCO, is HemOnc Today’s Chief Medical Editor for Oncology. He also is president of Levine Cancer Institute at Carolinas HealthCare System. He can be reached at derek.raghavan@carolinashealthcare.org.

Disclosure: Raghavan reports no relevant financial disclosures.