Read Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients Online
Authors: Ben Goldacre
An operation to remove a cancer, for example, has immediate short-term risks – you might die on the table in the operating theatre, or from an infection in the following week – but you hope that this short-term risk is offset by long-term benefits. If you do a trial to compare patients who have the operation with patients who don’t, but only measure outcomes for one week, you might find that those having the operation die sooner than those who don’t. This is because it takes months or years for people to die of the cancer you’re cutting out, so the benefits of that operation take months and years to emerge, whereas the risks, the small number of people who die on the operating table, appear immediately.
The same problem presents itself with drug trials. There might be a sudden, immediate, short-term benefit from a weight-loss drug, for example, which deteriorates over time to nothing. Or there might be short-term benefits and long-term side effects, which only become apparent in longer trials. The weight-loss treatment Fenphen, for example, caused weight loss in the positive short-term trials, but when patients receiving it were observed over longer periods, it turned out that they also developed heart valve defects.
11
Benzodiazapine drugs like valium are very good for alleviating anxiety in the short term, and a trial lasting six weeks would show huge benefits; but over the months and years that follow, their benefits decrease, and patients become addicted. These adverse long-term outcomes would only be captured in a longer trial.
Longer trials are not, however, automatically always better: it’s a question of the clinical question you are trying to answer, or perhaps trying to avoid. With an expensive cancer drug like Herceptin, for example, you might be interested in whether giving it for short periods is just as effective as giving it for long periods, in order to avoid paying for larger quantities of the drug unnecessarily (and exposing patients to a longer duration of side effects). For this you’d need short trials, or at the very least trials that reported outcomes over a long period, but after a short period of treatment. Roche applied for twelve-month treatment licences with Herceptin, presenting data from twelve-month-long trials. In Finland a trial was done with only a nine-week course of treatment, finding significant benefit, and the New Zealand government decided to approve nine-week treatment. Roche rubbished this brief study, and commissioned new trials for a
two-year
period of treatment. As you can imagine, if we want to find out whether nine weeks of Herceptin are as good as twelve months of Herceptin, we need to run some trials comparing those two treatment regimes: funding trials like these is often a challenge.
Trials that stop early
If you stop a trial early, or late, because you were peeking at the results as it went along, you increase the chances of getting a favourable result. This is because you are exploiting the random variation that exists in the data. It is a sophisticated version of the way someone can increase their chances of winning in a coin toss by using this strategy: ‘Damn! OK, best of three…Damn! Best of five?…Damn! OK, best of seven…’
Time and again in this book we have come back to the same principle: if you give yourself multiple chances of finding a positive result, but use statistical tests that assume you only had one go, you hugely increase your chances of getting a misleading false positive. This is the problem with people hiding negative results. But it also creeps into the way people analyse studies which haven’t been hidden.
For example, if you flip a coin for long enough, then fairly soon you’ll get four heads in a row. That’s not the same as saying ‘I’m going to throw four heads in a row right now,’ and then doing so. We know that the time frame you put around some data can allow you to pick out a clump of findings which please you; and we know that this can be a source of mischief.
The CLASS trial compared a new painkiller called celecoxib against two older pills over a six-month period. The new drug showed fewer gastrointestinal complications, so lots more doctors prescribed it. A year later it emerged that the original intention of the trial had been to follow up for over a year. The trial had shown no benefit for celecoxib over that longer period, but when only the results over six months were included, the drug shone. That became the published paper.
At this stage we should pause a moment, to recognise that it can sometimes be legitimate to stop a trial early: for example, if there is a massive, jaw-dropping difference in benefit between the two treatment groups; and specifically, a difference so great, so unambiguous and so informative that even when you factor in the risk of side effects, no physician of sound mind would continue to prescribe the losing treatment, and none will, ever again.
But you have to be very cautious here, and some terrible wrong results have been let through by people generously accepting this notion. For example, a trial of the drug bisoprolol during blood-vessel surgery stopped early, when two patients on the drug had a major cardiac event, while eighteen on placebo did. It seemed that the drug was a massive life-saver, and the treatment recommendations were changed. But when it began to seem that this trial might have overstated the benefits, two larger ones were conducted, which found that bisoprolol actually conferred no benefit.
12
The original finding had been incorrect, caused by researchers stopping the trial early after a fluke clump of deaths.
Peeking at your data during a trial can raise a genuinely troubling ethical question. If you seem to have found evidence of harm for one or other treatment before the end of the study period, should you continue to expose the patients in your trial to what might be a genuine risk, in the interests of getting to the bottom of whether it’s simply a chance finding? Or should you shut up shop and close the trial, potentially allowing that chance finding to pollute the medical literature, misinforming treatment decisions for larger numbers of patients in the future? This is particularly worrying when you consider that after a truncated trial, a larger one often has to be done anyway, exposing more people to risk, just to discover if your finding was an anomaly.
One way to restrict the harm that can come from early stopping is to set up ‘stopping rules’, specified before the trial begins, and carefully calculated to be extreme enough that they are unlikely to be triggered by the chance variation you’d expect to see, over time, in any trial. Such rules are useful because they restrict the intrusion of human judgement, which can introduce systematic bias.
But whatever we do about early stopping in medicine, it will probably pollute the data. A review from 2010 took around a hundred truncated trials, and four hundred matched trials that ran their natural course to the end: the truncated trials reported much bigger benefits, overstating the usefulness of the treatments they were testing by about a quarter.
13
Another recent review found that the number of trials stopped early has doubled since 1990,
14
which is probably not good news. At the very least, results from trials that stop early should be regarded with a large dose of scepticism. Particularly since these same systematic reviews show that trials which stop early often don’t properly report their reasons for doing so.
And all of this, finally, becomes even more concerning when you look at which trials are being truncated early, who they’re run by, and what they’re being used for.
In 2008, four Italian academics pulled together all the randomised trials on cancer treatments that had been published in the preceding eleven years, and that were stopped early for benefit.
15
More than half had been published in the previous three years, suggesting once again that this issue has become more prevalent. Cancer is a fast-moving, high-visibility field in medicine, where time is money and new drugs can make big profits quickly. Eighty-six per cent of the trials that stopped early were being used to support an application to bring a new drug onto the market.
Trials that stop late
It would be a mistake to think that any of these issues illustrate transgressions of simple rules that should be followed thoughtlessly: a trial can be stopped too early, in ways that are foolish, but it can also be stopped early for sensible reasons. Similarly, the opposite can happen: sometimes a trial can be prolonged for entirely valid reasons, but sometimes, prolonging a trial – or including the results from a follow-up period after it – can dilute important findings, and make them harder to see.
Salmeterol is an inhaler drug used to treat asthma and emphysema. What follows
16
is – if you can follow the technical details to the end – pretty frightening, so, as always, remember that this is not a self-help book, and it contains no advice whatsoever about whether any one drug is good, or bad, overall. We are looking at flawed methods, and they crop up in trials of all kinds of drugs.
Salmeterol is a ‘bronchodilator’ drug, which means it works by opening up the airways in your lungs, making it easier for you to breathe. In 1996, occasional reports began to emerge of ‘paradoxical bronchospasm’ with salmeterol, where the opposite would happen, causing patients to become very unwell indeed. Amateur critics often like to dismiss anecdotes as ‘unscientific’, but this is wrong: anecdotes are weaker evidence than trials, but they are not without value, and are often the first sign of a problem.
Salmeterol’s manufacturer, GSK, wisely decided to investigate these early reports by setting up a randomised trial. This compared patients on salmeterol inhalers against patients with dummy placebo inhalers, which contained no active medicine. The main outcome to be measured was carefully pre-specified as ‘respiratory deaths and life-threatening experiences’, combined together. The secondary outcomes were things like asthma-related deaths (which is a subset of all respiratory deaths), allcause deaths, and ‘asthma-related deaths or life-threatening experiences’, again bundled up.
The trial was supposed to recruit 60,000 people, and follow them up intensively for twenty-eight weeks, with researchers seeing them every four weeks to find out about progress and problems. For the six months after this twenty-eight-week period, investigators were asked to report any serious adverse events they knew of – but they weren’t actively seeking them out.
What happened next is a dismal tale, told in detail in a
Lancet
paper some years later by Peter Lurie and Sidney Wolfe, working from the FDA documents. In September 2002 the trial’s own monitoring board met, and looked at the 26,000 patients who had been through so far. Judging by the main outcome – ‘respiratory deaths and life-threatening experiences’ – salmeterol was worse than placebo, although the difference wasn’t quite statistically significant. The same was true for ‘asthma-related deaths’. The trial board said to GSK: you can either run another 10,000 patients through to confirm this worrying hint, or terminate the trial, ‘with dissemination of findings as quickly as possible’. GSK went for the latter, and presented this interim analysis at a conference (saying it was ‘inconclusive’). The FDA got worried, and changed the drug’s label to mention ‘a small but significant increase in asthma-related deaths’.
Here is where it gets interesting. GSK sent its statistics dossier on the trial to the FDA, but the figures it sent weren’t calculated using the method specified in the protocol laid down before the study began, which stipulated that the outcome figures for these adverse events should come from the twenty-eight-week period of the trial, as you’d imagine, when such events were being carefully monitored. Instead, GSK sent the figures for the full twelve-month period: the twenty-eight weeks when the adverse events were closely monitored, and also the six months after the trial finished, when adverse events weren’t being actively sought out, so were less likely to be reported. This means that the high rate of adverse events from the first twenty-eight weeks of the trial was diluted by the later period, and the problem became much less prominent.
If you look at the following table, from the
Lancet
paper, you can see what a difference that made. Don’t worry if you don’t understand everything, but here is one easy bit of background, and one hard bit. ‘Relative risk’ describes how much more likely you were to have an event (like death) if you were in the salmeterol group, compared with the placebo group: so a relative risk of 1.31 means you were 31 per cent more likely to have that event (let’s say, ‘death’).
The numbers in brackets after that, the ‘95 per cent CI’, are the ‘95 per cent confidence interval’. While the single figure of the relative risk is our ‘point estimate’ for the difference in risk between the two groups (salmeterol and placebo), the 95 per cent CI tells us how certain we can be about this finding. Statisticians will be queuing up to torpedo me if I oversimplify the issue, but essentially, if you ran this same experiment, in patients from the same population, a hundred times, then you’d get slightly different results every time, simply through the play of chance. But ninety-five times out of a hundred the true relative risk would lie somewhere between the two extremes of the 95 per cent confidence interval. If you have a better way of explaining that in fifty-four words, my email address is at the back of this book.
GSK didn’t tell the FDA which set of results it had handed over. In fact, it was only in 2004, when the FDA specifically asked, that it was told it was the twelve-month data. The FDA wasn’t impressed, though this is expressed in a bland sentence: ‘The Division presumed the data represented [only] the twenty-eight-week period as the twenty-eight-week period is clinically the period of interest.’ It demanded the twenty-eight-week data, and said it was going to base all its labelling information on that. This data, as you can see, painted a much more worrying picture about the drug.