Quest for clinically meaningful PFS benefit an ‘imperfect science’
Small improvements in study endpoints often serve as the foundation for drug approvals.
However, a small but vocal segment of the hematology and oncology research communities has begun to question whether survival improvements of a few months, weeks or even days — although statistically significant — are clinically meaningful for patients.
Earlier this year, the ASCO Clinically Meaningful Outcomes Working Groups called on researchers to “raise the bar” by establishing benchmarks for survival improvements that extend beyond statistical significance in the treatment of pancreatic, breast, lung and colon cancers.
The groups — which outlined their conclusions in April in Journal of Clinical Oncology — focused primarily on OS but also offered recommended targets for improvements in PFS, an increasingly common trial endpoint that appears popular with patients and researchers but does not always accurately predict long-term outcomes.

Patient preferences must be considered when trial endpoints are designed, according to Kathy D. Miller, MD, co-leader of the breast cancer program at Indiana University Simon Cancer Center. “If we had a discussion on how clinical trials are structured that involves our patients, I am not sure that the endpoints commonly used are the endpoints our patients would support,” she said.
Source: Photo courtesy of Indiana University Simon Cancer Center
“We hope that by proposing benchmarks, we will actually stimulate a much larger discussion on clinical trial design among investigators, clinicians, regulators, drug companies and our patients,” Kathy D. Miller, MD, co-leader of the breast cancer program at Indiana University Simon Cancer Center and a member of the ASCO Breast Cancer Working Group, told HemOnc Today. “We particularly don’t hear a lot from patients in this regard. What do our patients really want from their therapy? What’s valuable to them? That ought to be what drives our clinical trial endpoints.”
HemOnc Today spoke with several clinical trial experts about the value of PFS for patients, the clinical utility of the endpoint for researchers, whether proposed benchmarks go far enough to ensure true progress, and the ways in which clinical trials can be redesigned to improve the scientific value of PFS.
A patient-centered endpoint
The use of PFS as an alternative endpoint has increased as researchers strive to conduct more efficient and focused trials in an era of limited research funding.
A paper by Booth and colleagues, published in April 2012 in Journal of Clinical Oncology, indicated PFS was utilized as the primary endpoint in 26% of randomized controlled trials of systemic therapies for breast, colorectal and non–small cell lung cancers between 2005 and 2009. Conversely, no trials conducted between 1975 and 1984 used PFS.
“The context of trial design is the largest reason why PFS is continually utilized,” Joseph Paul Eder, MD, a medical oncologist in the early drug development and developmental therapeutics programs at Yale Cancer Center, said in an interview. “PFS truly allows researchers to conduct trials more quickly and less expensively. So if it is a valuable therapy, it can be used in that patient population more quickly than waiting for longer endpoints like [overall] survival.”
Despite trial design advantages, some researchers question whether PFS alone offers much clinical value.
“Achieving PFS may be more meaningful to a patient than to investigators because they can settle into an established and effective regimen that allows them to put some order in their lives instead of having to switch therapy every few weeks,” Eder said.
A report by Havrilesky and colleagues, published earlier this year in Cancer, supported the argument that PFS offers considerable value to patients. The analysis included data from 95 women with advanced or recurrent ovarian cancer. Results indicated women were more likely to choose a chemotherapy regimen based on its association with PFS than mode of administration, visit frequency, peripheral neuropathy, nausea and vomiting, fatigue and abdominal discomfort (P<.0001 for all). However, results also showed women would accept shorter PFS if it corresponded with a reduction in severe adverse events.
The results suggest the advantages and disadvantages of each therapy — even those that extend PFS — should be considered on an individual basis.
“Different patients may value PFS differently, and its value is likely to differ depending on disease status and costs of toxicity,” Miller said. “Not just financial costs, but how many times are you driving back and forth to the clinic? How much does it disrupt your work and family life? If the therapy is simple, convenient, has few side effects and is not ridiculously expensive, then prolonging PFS can be quite valuable.”
Although OS may be the ideal endpoint, patients may prefer to live for a shorter time with a higher quality of life rather than living for an extended period while suffering from symptoms and treatment-related toxicities.
“Sometimes there is a tradeoff between how well you live and how long you live. This is where patients might face an element of uncertainty,” Lowell E. Schnipper, MD, chief of hematology/oncology at Beth Israel Deaconess Medical Center and a HemOnc Today Editorial Board member, said in an interview. “Quality of life and symptomatic relief — including toxicity of the treatments administered, as well as the clinical symptoms that the disease is causing — are very important for patients with advanced disease.”
Yet, the flawed means of collecting quality-of-life data has impeded researchers’ abilities to link prolonged PFS with improved quality of life, Miller said.
“It’s been quite hard to document that improvements in PFS translate into improvements in quality of life,” Miller said. “We think it does — and it should — but we don’t have really good clinical trial documentation.”
Clinical utility of PFS
Despite patient preferences, questions remain about the clinical utility of PFS as an endpoint. Whether the endpoint is associated with OS varies according to disease setting and trial design, researchers said.
Aboshi and colleagues analyzed outcomes from 14 trials of patients with NSCLC. The results, published in May in Journal of Cancer Research and Clinical Oncology, found prolonged PFS was more likely to be associated with an OS advantage when the trial size was less than 150 patients, the mean age of the trial cohort was less than 63 years and fewer than 30% of the population had squamous histology.
“PFS definitely has utility, but it depends on the disease, the setting and how it’s being used,” Susan Geyer, PhD, associate professor of hematology and senior biostatistical scientist at The Ohio State University Wexner Medical Center, told HemOnc Today.
For more aggressive disease populations like those with acute myeloid leukemia, EFS or PFS is more likely to be strongly correlated with OS in older patients, although that may not necessarily be the case for younger patients.
“Whether you’re evaluating a cytostatic or cytotoxic agent, whether the trial is phase 2 or phase 3 — all of these are factors that come in to play when deciding if PFS is the best endpoint to use,” Geyer said.
The FDA allows for accelerated approval of an investigational agent if it has demonstrated benefit through a true surrogate endpoint.
According to a research report published in 2013 by Gutman and colleagues, 11 oncology drugs — three for breast cancer, two for colorectal cancer, one for ovarian cancer, three for renal cell carcinoma, one for NSCLC and one for squamous cell carcinoma of the head and neck — received FDA approval between 2005 and 2010 due to an improvement in PFS. Only two of these drugs also demonstrated an OS improvement.
In 2011, the FDA revoked the approval of one of these 11 drugs — bevacizumab (Avastin, Genentech) — for treatment of metastatic breast cancer after the improvements in PFS did not correlate with prolonged OS.
“There are several examples of drugs that have been approved on the basis of an early improvement in PFS, and subsequent studies have shown no improvement in [overall] survival at all with added toxicity,” Ian F. Tannock, PhD, MD, FRCPC, a physician in the department of medical oncology and hematology at Prince Margaret Cancer Center in Toronto, said in an interview. “My view is that PFS should rarely be the preferred endpoint, but obviously companies will use whatever the FDA and EMA let them get away with.”
Yet, there are settings in which PFS may be an appropriate OS surrogate because researchers are ethically obliged to have patients switch to the therapy that demonstrates an early large improvement in PFS, especially when there are no established alternative treatments, Tannock said.
He cited a trial for metastatic renal cell carcinoma in which a large gain in PFS allowed patients to cross over from interferon to sunitinib (Sutent, Pfizer). A similar circumstance occurred in a trial that involved patients with metastatic melanoma. Patients assigned dacarbazine — which confers low response rates — were able to receive the more effective investigational treatment vemurafenib (Zelboraf; Genentech, Daiichi Sankyo). The mature results of both these trials showed an improvement in OS.
The ability to cross over to the treatment arm that has the new drug being tested after those patients on the control arm have progressed can cause significant challenges in comparing OS between the arms, but it can greatly increase the appeal of the trial. An example of this is the ability of control-arm patients to cross over to receive ibrutinib (Imbruvica; Pharmacyclics, Janssen) on a trial designed for patients with chronic lymphocytic leukemia, Geyer said.
“It can be more appealing for patients to know they can receive the agent that is felt to have promise after progression on the standard-of-care arm,” Geyer said. “Especially in the aggressive disease setting, there aren’t always a lot of treatment options after progression. When patients cross over, you still garner information about how the experimental regimen behaved in the salvage setting.”
Yet, a trial designed to see only small changes in PFS — although an assumed patient-centered endpoint — may primarily benefit drug development rather than patient interests.
“If we only try to achieve correlation with PFS and OS, then this says PFS is not valuable, and that the only reason to pay attention to PFS is as a surrogate for OS,” Miller said. “If we are saying the only thing we care about is OS, then why bother with a surrogate at all? Why not just design a study that is appropriately powered with appropriate follow-up to see if it improves OS?”
Clinically meaningful vs. statistically significant
Larger PFS differences should increase the likelihood that researchers will see more meaningful differences in OS, Tannock said.
ASCO’s proposed endpoint benchmarks for clinically meaningful outcomes — published in April in Journal of Clinical Oncology — therefore may unite trial goals of investigators and patients.

Source: Adapted from: Ellis LM. J Clin Oncol. 2014;32:1277-1280.
“What’s meaningful to doctors and scientists planning trials might be different than the individuals suffering from the illness,” said Schnipper, an author on the paper. “We’re hoping for a meeting point between those two tensions so, as we set the bar higher for each trial, we might get more rapid progress.”
The benchmarks propose ranges for PFS and OS improvements from current median survival in order to have the most impact on patient outcomes (see Table). Thus, the paper — according to its authors — creates a distinction between small, statistically significant improvements and those that have true clinical meaning for patients.
Although statistical significance often provides the basis for drug approval, these data are influenced by trial design.
“If you have a null hypothesis of no correlation and a very large population, you could have a correlation that reaches statistical significance even though the actual correlation could be very low,” Geyer said. “There has to be a marriage of looking at clinically meaningful differences, as well as statistical significance. Only focusing on the P values, especially with a very large sample size, can be misleading.”
There are several examples of trials that conferred statistically significantly improved outcomes without changing the standard of care.
Schnipper referenced a pancreatic cancer trial conducted by Moore and colleagues, results of which were published in 2007 in Journal of Clinical Oncology. The trial — which evaluated gemcitabine with or without erlotinib (Tarceva; Genentech, Astellas) — was deemed positive because the data met statistical significance. However, the combination extended median OS (6.24 months vs. 5.91 months) and PFS (3.75 months vs. 3.55 months) by less than 1 month each. The FDA approved erlotinib on the basis of this trial.
“Mathematically, the data were statistically significant,” Schnipper said. “But is it a clinically meaningful change? I think the resounding sense is no, so I don’t think the drug is being used for that purpose.”
Median survival among patients with metastatic pancreatic cancer who are eligible for gemcitabine is 8 to 9 months. For that group, a 3-month to 4-month improvement in PFS and OS would yield clinical meaningfulness, according to the ASCO working group.
More recently, the SQUIRE trial — conducted by Thatcher and colleagues, and presented at ASCO — evaluated the addition of necitumumab (IMC-1158, Eli Lilly) to gemcitabine and cisplatin in patients with squamous NSCLC. The trial was deemed positive due to small yet statistically significant improvements in median OS (11.5 months vs. 9.9 months) and PFS (5.7 months vs. 5.5 months) among patients assigned the combination. However, the ASCO benchmarks suggest anything less than a 2.5- to 3-month improvement in OS and a 3-month improvement in PFS should not be deemed clinically meaningful.
“You could prove that a regimen meets statistical significance, but with such small changes, nobody is going to use it,” Eder said. “So, why are you wasting your time, money and patients’ experience on these agents? These kinds of endpoints are not going to change the way people practice.”
Constraints of care
Researchers faced substantial resistance before arriving at the proposed ASCO benchmarks, Schnipper said.
Proponents of lower benchmarks believe small changes can be additive over time and lead to faster drug development.
“Small improvements should tell use there is something beneficial about this drug,” Miller said. “It definitely has activity, but we need to understand better who is likely to benefit so that we can hone in on that population. Those small benefits ought to send us back to the drawing board.”
But small changes come at a cost.
“The counterargument against setting the bar too high is that cancer research has been improving incrementally, not by one big hit,” Schnipper said. “That is correct, but there is an opportunity cost for doing clinical trials that cost millions of dollars but only yield a very small benefit.”

Ian F. Tannock
Researchers also need to keep in mind the toxicities patients face with each step toward a small improvement, Tannock said.
The current median survival and available therapies in a specific setting should be taken into account when setting the benchmarks, Eder said.
“If there was an innovative therapy in a disease for which there is nothing else, my bar would be much lower,” Eder said. “But if this is just going to be a third or fourth agent in its class, I am much less likely to view small changes favorably, because I do not think it is going to be of tremendous value to patients. It is not going to push the field forward, or get us any closer to the idea of actually treating the cancer effectively, as we do so infrequently in certain tumor types.”
Some researchers had hoped for higher benchmarks than the ASCO paper proposes.
“We have this nonsense of approving drugs that give a 1- to 2-month improvement — or even in the case of a pancreatic drug, a 10-day improvement — in median survival,” Tannock said. “We cannot afford that because it is not clinically meaningful. We need to raise the bar.”
Schnipper said he wanted to establish benchmarks for 50% improvements in each of the four designated diseases.
“Those who wanted lower benchmarks said if we were too aggressive, then some of the drugs that we consider to be of value today might not have been approved,” Schnipper said. “It’s an imperfect science, because you don’t want to throw away a drug that might be helpful. On the other hand, women and men are dying of advanced cancer in very substantial numbers still. That was our tension.”
Trial redesign
Specific attention to the design of clinical trials and the interpretation of results are key to the implementation of the endpoint benchmarks.
“It should be indicated as much as possible in the discussion section of the paper how the patients did symptomatically,” Schnipper said. “‘Clinically meaningful’ means how much the patients were helped, not just by having an X-ray-detected change in tumor size and meeting some criteria for response.”
Rather than deeming all trials that meet statistical significance as positive, researchers should define a difference in the primary endpoint that would make the trial positive during their design, Tannock said.
“The problem with the current FDA and EMA policy is that it has led to super-large trials,” Tannock said. “Companies are willing to do trials with more than a thousand patients to show a small difference that is statistically significant, and then they get drug registration. The advantage of the higher bar is that we would do trials that are somewhat smaller so we are not chasing tiny differences.”
A study by Adrian G. Sacher, MD, of Princess Margaret Hospital in Toronto, and colleagues corroborates the notion that trial size is growing while clinically meaningful outcomes are diminishing.
The researchers evaluated trends in phase 3 NSCLC trials. The analysis — published in March in Journal of Clinical Oncology — showed the median sample size of trials increased from 152 patients in the 1980s to 413 patients in the 2000s. In addition, although the percentage of trials that yielded statistically significant improvements in endpoints remained consistent from 1980 to 1990 (29%) to 2001 to 2010 (31%), the number of trials that reported positive outcomes significantly increased during that time (30% to 53%; P<.001).
“Our findings clearly point to a disquieting trend in NSCLC trials. They are becoming larger, utilizing less clinically meaningful endpoints and becoming less effective in identifying clinically useful new drugs,” Sacher told HemOnc Today when the study was published. “The findings of this study call into question the rationale for designing large-scale clinical trials in unselected populations, especially when these trials are powered to detect clinically insignificant differences between treatments.”
Yet, researchers are hopeful trial redesign can help investigators focus on the benchmarks for clinically meaningful improvements.
Precision medicine’s focus on targets may lead to smaller population sizes.
“When patients are segmented based on specific indications within a disease, you would expect the two arms to be much more comparable, and PFS might be more meaningful there,” Eder said.
These smaller studies also may involve more steps to ensure an investigational agent will lead to clinically meaningful outcomes prior to the launch of a large, phase 3 trial.
“We are starting to conduct discovery-setting phase 2 trials focused not on a specific disease group but on a molecular target, that can then lead to other subsequent phase 2 trials where we are focusing on that target in a disease,” Geyer said. “Here we can ask, ‘What is our early evidence of activity? Do we meet the bar or do we not?’”
With these smaller levels of clinical trials, PFS might make more sense than OS as the primary endpoint.
“OS can take much, much longer to evaluate in some disease settings,” Geyer said. “It is not that you are trying to cut corners. It is a matter of identifying beneficial agents and answering your research question before it becomes obsolete.”
Value-based pricing is another option, Tannock said.
He co-authored a study published in 2011 in Journal of Clinical Oncology that reviewed randomized controlled trials of agents approved by the FDA since 2000. Results showed that specific targeted agents were associated with greater OS (HR=0.69) and PFS (HR=0.42) improvements than less-specific biologic targeted agents and chemotherapies. However, there was no difference in the median monthly prices for each these drug categories (P=.87).
“If you had value based-pricing, companies would have to price the drug to be cost-effective according to their degree of improvement,” Tannock said. “That would stop the trend of super-large trials that yield trivial differences.”
Patient considerations also must play a role in clinical trial changes.
“If we had a discussion on how clinical trials are structured that involves our patients, I am not sure that the endpoints commonly used are the endpoints our patients would support,” Miller said. “My patients tell me very simply, they want to live longer if they can, and they want to live better.” — by Alexandra Todak
References:
Aboshi M. J Cancer Res Clin Oncol. 2014;140:839-848.
Amir E. J Clin Oncol. 2011;29:2543-2549.
Booth CM. J Clin Oncol. 2012;30:1030-1033.
Ellis LM. J Clin Oncol. 2014;32:1277-1280.
Gutman SI. Progression-free survival: What does it mean for psychological well-being or quality of life? Rockville, MD: Agency for Healthcare Research and Quality; 2013.
Havrilesky LJ. Cancer. 2014;10.1002/cncr.28940.
Moore MJ. J Clin Oncol. 2007;25:1960-1966.
Sacher AG. J Clin Oncol. 2014;doi:10.1200/JCO.2013.52.7804.
Thatcher N. Abstract #8008. Presented at: ASCO Annual Meeting, May 30-June 3, 2014; Chicago.
For more information:
Joseph Paul Eder, MD, can be reached at Yale Cancer Center, P.O. Box 208028, New Haven, CT 06520-8028; email: joseph.eder@yale.edu.
Susan Geyer, PhD, can be reached at The Ohio State University Wexner Medical Center, Department of Internal Medicine, Starling Loving Hall, 320 W. 10th Ave., Columbus, OH 43210; email: susan.geyer@osumc.edu.
Kathy D. Miller, MD, can be reached at Indiana University Simon Cancer Center, 535 Barnhill Drive, RT 473, Indianapolis, IN 46202; email: kathmill@iu.edu.
Lowell E. Schnipper, MD, can be reached at Beth Israel Deaconess Medical Center, 330 Brookline Ave., Boston, MA, 02215; email: lschnipp@bidmc.harvard.edu.
Ian F. Tannock, PhD, MD, FRCPC, can be reached at Princess Margaret Cancer Centre, 5th Floor, Room 208, 610 University Ave., Toronto, Ontario, Canada M5G 2M9; email: ian.tannock@uhn.ca.
Disclosure: Eder, Geyer, Miller, Schnipper and Tannock report no relevant financial disclosures.
Should OS be the primary endpoint in all phase 3 clinical trials?
OS is the ideal endpoint.

Alan P. Venook
When we set out on the war on cancer, we were not looking for a negotiated settlement.
The goal was to eradicate the disease and — if not to prevent it — to enable people to live longer lives. In the 1960s — before modern technology — the only endpoint was OS (with some attention to quality of life).
The challenge is to determine which therapies maximize patient survival without having to follow patients until their deaths. However, the so-called surrogate endpoints of phase 3 clinical trials are of value and valid only if they predict the ultimate outcome for patients. Do they measure up?
On the assumption that less cancer burden is better — and armed with new imaging techniques — objective response rate (ORR) evolved as an endpoint, primarily to identify therapies of promise. To this day, an occasional phase 3 study may be designed around differences in ORR (eg, Heinemann V. Lancet Oncol. 2014;doi:10.1016/S1470-2045(14)70330-4 2014), yet such studies will only generate heat if there is an OS difference at the end. Unfortunately, as a secondary endpoint in such circumstances, the certainty of the OS measure is subject to debate.
PFS as an endpoint was clearly relevant and important when it represented failing treatment leading to symptoms and clinical deterioration. But these days, PFS is almost always determined by the radiologist, whose nuanced reading of a scan leads to a change to another therapy long before symptoms develop. With such lead time, more and more treatments may be introduced, often minimizing the impact of the first-line of therapy.
Other, more ornate surrogate endpoints are now being studied, but only by following patients until their deaths will we know if they predict for that critical endpoint. Many other factors matter but it is still about lengthening lives, and so far, there is no substitute for OS.
Alan P. Venook, MD, is Madden family distinguished professor of medical oncology and translational research, as well as professor of clinical medicine in the division of medical oncology at University of California, San Francisco. He can be reached at 1600 Divisadero St., Box 1770, San Francisco, CA 94115; email: venook@cc.ucsf.edu. Disclosure: Venook reports research funding from Bristol-Myers Squibb and Genentech/Roche.
OS absolutely should not be used as the primary endpoint in all phase 3 trials.

Daniel J. Sargent
In phase 3 trials in the adjuvant setting in diseases such as breast and colon cancers, DFS is an established and well-accepted primary endpoint. In advanced disease, in settings in which survival is lengthy — such as follicular lymphoma — PFS also is an accepted endpoint.
In addition, PFS is a viable and likely preferred endpoint in settings in which multiple effective lines of therapy are available, such as first-line colon or breast cancers. Insistence on an OS endpoint in these settings would require extremely large and lengthy trials that would have a high risk of negative results, even if there is a treatment benefit, due to post-trial confounding.
PFS adds value to all parties: researchers, regulators and, most importantly, patients. In some diseases, progression is associated with increasing problematic symptoms, so delaying progression is of direct patient benefit. More generally, the use of the PFS endpoint allows trials to be completed more quickly and with fewer patients, which allows expedited regulatory consideration and, thus, new treatments to be available to patients more quickly.
Critically, PFS also provides a clearer, more direct signal of an agent’s efficacy than OS that is not confounded by crossover or subsequent treatments, which could potentially obscure a true treatment benefit from a new therapy.
PFS does have limitations, however, including potential for measurement error and bias, and differences must be of a magnitude that is convincing and clinically meaningful in a setting where the new treatment’s adverse-event profile results in a positive overall risk–benefit ratio for patients.
Finally, the use of the OS endpoint effectively precludes crossover to the experimental agent upon disease progression, so in settings in which there is strong evidence of efficacy from initial studies, the PFS endpoint allows a clear efficacy determination while still providing all enrolling patients the ability to receive the new agent, either initially or upon failure of initial treatment on the control arm. Requiring OS as the endpoint for all phase 3 trials would be a significant step backward for oncology drug development and for patients.