Read The Half-Life of Facts Online
Authors: Samuel Arbesman
The research that had initially higher effects ranged across many areas of study. From treatment of HIV to angioplasty or strokes, none of these areas were immune to the decline effect. And, of course, a similar range of areas was affected by contradictions: coronary artery disease, vitamin E research, nitric oxide, and more. As the saying among doctors goes, “Hurry up and use a new drug while it still works.”
What was the cause of the decline effect here? Did Ioannidis ascribe this to anything new? Far from being the result of anything spectacular or confusing, the decline effect often comes down to a matter of replication and importance. The more something is tested, the better we understand it. Often, more important areas are those that are tested more frequently. It is likely that there are a good deal more incorrect effects out there in the medical literature than we are even aware of, just waiting to be tested.
Of course, it’s not always this clear. As Ioannidis noted:
Whenever new research fails to replicate early claims for efficacy or suggests that efficacy is more limited than previously thought, it is not necessary that the original studies were totally wrong and the newer ones are correct simply because they are larger or better controlled. Alternative explanations for these discrepancies may include differences in disease spectrum, eligibility criteria, or the use of concomitant interventions.
We should be wary of jumping to conclusions.
Nevertheless, in consonance with the idea of increasing precision and p-values, Ioannidis wrote:
In the case of initially stronger effects, the differences in the effect sizes could often be within the range of what would be expected based on chance variability. This reinforces the notion that results from clinical studies, especially early ones, should be interpreted using not only the point estimates but also the uncertainty surrounding them.
More recently, Ioannidis conducted the same test for various biomarkers and found that subsequent meta-analyses often found diminished effects. We must always be aware of the fact that we are dwelling in uncertainty. Forgetting can make us jump to unwarranted conclusions.
. . .
THESE
contradicted effects are related to what is perhaps Ioannidis’s most well-known paper, which has acted as a sort of broadside on many aspects of how science is done. His 2005 paper in the journal
PLoS Biology
was titled “Why Most Published Research Findings Are False.” As of late 2011, it has been viewed more than four hundred thousand times and cited more than eight hundred times.
He lays out very clearly a mathematical argument for why many scientific claims are untrue. Elaborating on several of the themes already discussed, what he looks for are situations in which there are cases of false positives, instances where a finding is “discovered” even though it’s not actually real.
In a wonderful bit from
The Daily Show
, correspondent John Oliver interviews Walter Wagner, a science teacher who tried to prevent, via lawsuit, the Large Hadron Collider from being turned on. The Large Hadron Collider is a massive particle accelerator capable of generating huge amounts of energy, and Wagner was concerned that it could create a black hole capable of destroying the earth.
When Oliver presses Wagner on the chances that the world will be destroyed, he states that “the best we can say right now is about a one in two chance.” Wagner bases this on the idea that it will either happen or it won’t, so therefore it must be 50-50.
But this is absurd. Prior to the testing of a hypothesis, there is a certain expectation of what might happen. As another scientist interviewed by
The Daily Show
stated, there is a 0 percent chance of the earth being destroyed, based on what we already know about the fundamental laws of physics and how particle accelerators work.
This probability—what we expect to occur when we test a hypothesis—is known as the
prior probability
. The prior probability is simply the probability that the hypothesis is true prior to testing. Once we’ve tested it, we then get something known as a
posterior probability
: the probability that it is true, after our test.
Ioannidis argues that in a given field there is a certain fraction of relationships between variables that are real but many more that are spurious. For each field, then, there is a ratio of the relationships that are real to those that aren’t. Think of it as the ratio between smoking-causes-cancer hypotheses and green-jelly-beans-cause-acne hypotheses.
Ioannidis then uses this ratio, along with something known as our hypothetical experiment’s
discriminating power
—a number
that encapsulates the ability of the experiment to actually yield a positive result—to calculate whether the experimental result is valid.
Essentially, in a quantitative way, he shows that in a large number of situations—whether due to the study being done in a field in which the above ratio is fairly low, implying that the probability of a spurious relationship is high, or an experiment using very few subjects, or the study was done in an area where replication of results doesn’t occur—statistically significant and publishable results can occur, even though they are actually not true.
Ioannidis helpfully provides a few corollaries of his analysis that grow out of common sense, and I’ve added my own annotations:
The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
If a study is small, it can yield a positive result more easily due to random chance. This is like the classic clinical trial joke, in which, upon testing a new pharmaceutical on a mouse population, it was reported that one-third responded positively to the treatment, one-third had no response, and the third mouse ran away.
The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
If an effect is small, it could be like Planet X, and we are simply measuring noise.
The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
More experiments mean that some of them might simply be right due to chance, and get published.
The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true
. If there’s a
greater possibility of massaging the data to get a good result, then there’s a greater chance that someone will do so.
The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
Since scientists are people too, and are not perfect beings, the greater the possible bias, the greater the chance the findings aren’t true.
The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.
More teams mean that any positive result gets a great deal of hype quite rapidly, and is pushed out the door quickly, but leads to research that can be easily refuted, with an equal amount of hype. Ioannidis refers to this as a cause of the Proteus phenomenon, which he defined as “rapidly alternating extreme research claims and extremely opposite refutations.”
. . .
ONE
simple way to minimize a lot of this trouble is through replication, measuring the same problem over and over. Too often it’s much more glamorous to try to discover something new than to simply do someone else’s experiment a second time. In addition, many scientists, even those who want to replicate findings, find it difficult to do so. Especially when they think a result is actually wrong, there is even more of a disincentive.
Why is this so? Regarding a kerfuffle about the possibility of bacteria that can incorporate arsenic into their DNA backbone—a paper published in
Science
—Carl Zimmer explains:
But none of those critics had actually tried to replicate the initial results. That would take months of research: getting the bacteria from the original team of scientists, rearing them, setting up the experiment, gathering results and
interpreting them. Many scientists are leery of spending so much time on what they consider a foregone conclusion, and graduate students are reluctant, because they want their first experiments to make a big splash, not confirm what everyone already suspects.
“I’ve got my own science to do,” John Helmann, a microbiologist at Cornell and a critic of the
Science
paper, told
Nature
.
Or to put it more starkly, as Stephen Cole, a sociologist of science at the State University of New York, Stony Brook, quoted one scientist, “If it confirmed the first researcher’s findings, it would do nothing for
them
[the team performing the replication], but would win a Nobel Prize for
him
, while on the other hand, if it disconfirmed the results there would be nothing positive to show for their work.”
But only through replication can science be the truly error-correcting enterprise that it is supposed to be. Replication allows for the overturning of results, as well as an approach toward truth, and is what science is ultimately about. In a paper that followed up on Ioannidis’s somewhat pessimistic conclusion, researchers calculated that a small amount of replication can lead us to much more robust science. But how do we do this?
A number of scientists are trying to make it more acceptable, and easier, to publish negative results. Since science prioritizes the exciting and the surprising, it is nearly impossible to publish a paper that says that some hypothesis is false. In fact, unless the work overturns some well-known result or dogma, the publication will never receive a hearing. Many scientists are advocating for journals and databases devoted to publicizing negative results to fill this publishing void, and have begun such journals. These could act as a check on the positive results so often seen in the literature and help provide a handle on the nature of the decline effect. In addition, they have the potential to act as a series of
guideposts for other scientists, allowing them to see what hasn’t worked before so they can steer clear of unsuccessful research.
. . .
SCIENCE
is not broken. Lest the above worry the reader, science is far from a giant erroneous mass. But how do we return from the brink, where error and sloppy results might appear to be widespread?
Luckily, many of the erroneous and sloppy aspects of science are rare. While they do occur in a few instances, science as a whole still moves forward.
As Lord Florey, a president of the Royal Society, stated:
Science is rarely advanced by what is known in current jargon as a “breakthrough,” rather does our increasing knowledge depend on the activity of thousands of our colleagues throughout the world who add small points to what will eventually become a splendid picture much in the same way the Pointillistes built up their extremely beautiful canvasses.
Science is not always cumulative, as the philosopher of science Thomas Kuhn has noted. There are setbacks, mistakes, and wrong turns. Nonetheless, we have to distinguish the
core
of science from the
frontier
, terms used by SUNY Stony Brook’s Stephen Cole. The core is the relatively stable portion of what we know in a certain field, the facts we don’t expect to change. While it’s no doubt true that we will learn new things about how DNA works and how our genes are turned on and off, it’s unlikely that the basic mechanism of encoding genes in DNA is some sort of mesofact. While this rule of how DNA contains the information for proteins—known as the central dogma of biology—has become more complex over time, its basic principles are part of the core of our knowledge. This is
what is generally considered true by consensus within the field, and often makes its way into textbooks.
On the other hand, the frontier is where most of the upheaval of facts occur, from the daily churn in what the newspapers tell us is healthy or unhealthy, to the constant journal retractions, clarifications, and replications. That’s where the scientists live, and in truth, that’s where the most exciting stuff happens. The frontier is often where most scientists lack a clear idea of what will become settled truth.
As John Ziman, a theoretical physicist who thought deeply about the social aspects of science, noted:
The scientific literature is strewn with half-finished work, more or less correct but not completed with such care and generality as to settle the matter once and for all. The tidy comprehensiveness of undergraduate Science, marshalled by the brisk pens of the latest complacent generation of textbook writers, gives way to a nondescript land, of bits and pieces and yawning gaps, vast fruitless edifices and tiny elegant masterpieces, through which the graduate student is expected to find his way with only a muddled review article as a guide.
And pity the general public trying to make sense of this.
The errors at the frontier are many, from those due to measurement or false positives, to everything else that this book has explored. But it’s what makes science exciting. Science is already a terribly human endeavor, with all the negative aspects of humanity. But we can view all of this uncertainty in a positive light as well, because science is most thrilling and exciting when it’s unsettled.
There is a sifting and filtering process that moves knowledge from the frontier to the relatively compact and tiny core of knowledge. We should enjoy this process, rather than despair. One of the most fulfilling aspects is not the upheaval and churning of facts, but rather being able to grapple with concepts that explain our
world. And these new facts are now only possible due to measurement.
. . .
IN
addition to exposing quantitative error or delimiting what’s around us (as in the case of Mount Everest’s height), measurement can also have the profound benefit of overturning simple ideas and creating new pieces of knowledge, things we never could have known before.