The Half-Life of Facts (16 page)

Read The Half-Life of Facts Online

Authors: Samuel Arbesman

BOOK: The Half-Life of Facts
9.36Mb size Format: txt, pdf, ePub

But Green, the Shakespeare of mathematical physics, is not the only example of these sorts of people. There are many instances when knowledge is not recognized or not combined, because it’s created by people who are simply too far ahead of their time, or who come from backgrounds that are so different from what is traditionally expected for scientific insight. For example, Gregor Mendel, now recognized as the father of genetics, died without being known at all. It wasn’t until years after his death that the Augustinian monk’s work was rediscovered, due to other scientists doing similar experiments and stumbling upon his findings. Yet he laid the foundation for the concepts of genes and the mathematics of the heritability of discrete traits.

There is also Charles Babbage, who designed the first mechanism for a programmable computer, but who had the misfortune of living during the Industrial Revolution, when he was unable to construct his invention, mainly because it was far too expensive at the time. His Difference Engine No. 2 actually had parts that
corresponded exactly to the memory and processors found in modern computers.

So facts can remain hidden for a long time, whether the problem is because they are very advanced or because they come from a different discipline. But is there a way to measure this? Specifically, how often is knowledge skipped over?

A recent study in the
Annals of Internal Medicine
examined this phenomenon in a quantitative and rigorous fashion. Karen Robinson and Steven Goodman, located at the Johns Hopkins University, wanted to see how often scientists were aware of previous research before they conducted a clinical trial. If science properly grows by accreting information, it should take into account everything that has come before it. But do scientists actually do that? Based on everything I’ve mentioned so far, the answer likely will be no. But then, what fraction of the time do we ignore (or simply don’t know about) what has come before us?

Robinson and Goodman set out to see how often scientists who perform a clinical trial in a specific field cite the relevant literature when publishing their results. For example, if a clinical trial related to heart attack treatment is performed, Robinson and Goodman wanted to see how many of these trials cite the trials in that area that had come before it. While a clinical trial needn’t cite every paper that preceded it, it should provide an overview of the relevant literature. But how to decide which papers are relevant and which ones aren’t? Rather than be accused of subjectivity, or have to gain expertise in countless specific areas, Robinson and Goodman sidestepped these problems by doing something clever: They looked at meta-analyses.

Meta-analysis is a well-known technique that can be used to extract more meaning from specific papers than could be gained from looking at each one alone. A meta-analysis combines the results of papers in a specific area in order to see if there is a consensus or if more precise results can be found. They are like the analyses of thermal conductivity for different elements mentioned in
chapter 3
, which use the results from lots of different articles to
get a better picture of the shape of what we know about how these elements conduct heat.

Assuming the meta-analyses bring together all the relevant trials, Robinson and Goodman simply looked through all the studies examined in each meta-analysis to see how many of the studies cited in the meta-analyses were also mentioned in each of the newer studies being examined.

What they found shouldn’t be surprising. Scientists cite fewer than 25 percent of the relevant trials when writing about their own research. The more papers in the field, the smaller the fraction of previous papers that were quoted in a new study. Astonishingly, no matter how many trials had been done before in that area, half the time only two or fewer studies were cited.

Not only are a small fraction of the relevant studies being cited, there’s a systematic bias: The newer ones are far more likely to be mentioned. This shouldn’t be surprising after our discussion of citation decay and obsolescence in
chapter 3
. And it is hardly surprising that scientists might use the literature quite selectively, perhaps to bolster their own research. But when it comes to papers that are current, relevant, and necessary for the complete picture of the current state of a scientific question, this is unfortunate.

Imagine if we actually combined all the knowledge in a single field, and if scientists actually read all the analyses that their work was based on. What would happen to facts then? Would it make any difference?

Quite a bit, it turns out.

.   .   .

IN
1992, a team of scientists from the hospitals and schools associated with Harvard University performed a new type of analysis. These researchers, Joseph Lau and his colleagues, examined all the previously published randomized clinical trials related to heart attacks. Specifically, they looked at all trials that were related to the use of a drug called a streptokinase to treat these heart attacks. Combing through the literature, they found that there were thirty-three
trials between the years 1959 and 1988 that used this treatment and examined its effectiveness.

Why did they stop at 1988 instead of going all the way up to 1992? Because 1988 was the year that a very large study was published, finally showing definitively that intravenous streptokinase helped to treat heart attacks. But Lau and his colleagues did something clever.

Lau lined up the trials chronologically and examined each of their findings, one after the other. The team discovered something intriguing. Imagine you have just completed a clinical trial with your drug treatment of choice. But instead of just analyzing the results of your own trial, you combine your data with that of all of the studies previously completed up until then, making the dataset larger and richer. If you did that, Lau and his colleagues discovered, a researcher would have known that intravenous streptokinase was an effective treatment years before this finding was actually published. According to their research, scientists could have found a statistically significant result in 1973, rather than in 1988, and after only eight trials, if they had combined the disparate facts.

This type of analysis is known as
cumulative meta-analysis
. What Lau and his colleagues realized was that meta-analyses can be viewed as a ratchet rather than simply an aggregation process, with each study moving scientific knowledge a little closer to the truth. This is ultimately what science should be: an accumulation of bits of knowledge, moving ever forward, or at least sweeping away error as best we can. Lau and his colleagues simply recognized that to be serious about this idea of cumulative knowledge, you have to truly combine all that we know and see what new facts we can learn.

While Don Swanson combined papers from scientific areas that should have overlapped but didn’t, Lau and his colleagues combined papers from very similar areas that had never been combined, looking at them more carefully than they had been examined up until then. By using cumulative meta-analysis, hidden knowledge could have been revealed fifteen years earlier than it actually was and helped improve the health of countless individuals.

Modern technology is beginning to aid cumulative meta-analysis and its development, and we can even now use computational techniques to employ Swanson’s methods on a grand scale.

.   .   .

WE
are not yet at the stage where we can loose computers upon the stores of human knowledge only to return a week later with discoveries that would supplant those of Einstein or Newton in our scientific pantheon. But computational methods are helpful. Working in concert with people—we are still needed to sort the wheat from the chaff—these programs can connect scientific areas that ought to be speaking to one another yet haven’t. These automatic techniques help to stitch together different fields until the interconnectivity between the different areas becomes clear.

In the fall of 2010, a team of scientists in the Netherlands published the first results of a project called CoPub Discovery. Their previous work had involved the creation of a massive database based on the co-occurrence of words in articles. If two papers both have the terms
p53
and
oncogenesis
, for example, they would be linked more strongly than words with no two key terms in common. CoPub Discovery involved creating a new program that mines their database for unknown relationships between genes and diseases.

Essentially, CoPub Discovery automates the method that Don Swanson used to detect the relationship between fish oil and Raynaud’s syndrome but on a much larger scale. It can detect relationships between thousands of genes and thousands of diseases, gene pathways, and even the effectiveness of different drugs. Doing this automatically allows many possible discoveries to be detected. In addition, CoPub Discovery also has a careful system of checks designed to sift out false positives—instances where the program might say there is an association when there really isn’t.

And it works! The program was able to find a number of exciting new associations between genes and the diseases that they may cause, ones that had never before been written about in the literature.

For example, there is a condition known as Graves’ disease that normally causes hyperthyroidism, a condition in which the thyroid produces too much hormone. Symptoms include heat intolerance and eyes that stick out more prominently, yielding a somewhat bug-eyed appearance for sufferers. CoPub Discovery, when automatically plowing through the large database, found a number of genes that had never before been implicated in Graves’ that might be involved in causing the disease. Specifically, it found a large cluster of genes related to something known as
programmed cell death
.

Programmed cell death is not nearly as scary as it sounds. Our bodies often require the death of individual cells in order to perform correctly, and there is a set of genes in our cells tailored for this purpose. For example, during embryonic development, our hands initially have webbing between the fingers. But prior to birth the cells in the webbing are given the signal to die, causing us to not have webbed hands. Webbed hands and feet only occur when the signal is given incorrectly, or when these genes don’t work properly.

What CoPub Discovery computationally hypothesized is that when these programmed cell death genes don’t work properly in other ways, a cascade of effects might follow, eventually leading to the condition known as Graves’ disease. CoPub Discovery has also found relationships between drugs and diseases and determined other previously unknown effects of currently used drugs. For example, while a medicine might be used to help treatment for a specific condition, not all of its effects might be known. Using the CoPub Discovery engine and the concept of undiscovered public knowledge, it becomes possible to actually see what the other effects of such a drug might be.

The researchers behind CoPub Discovery did something even more impressive. Rather than simply put forth a tool and a number of computationally generated hypotheses—although this is impressive by itself—they actually tested some of the discoveries in the laboratory. They wanted to see if these pieces of newly revealed knowledge are actually true. Specifically, CoPub Discovery predicted
that two drugs, dephostatin and damnacanthal, could be used to slow the reproduction and proliferation of a group of cells. They found that the drugs actually worked—the larger the dose of these drugs, the more the cells’ growth was inhibited. This concept is known as drug repurposing, where hidden knowledge is used to determine that medicines are useful in treating conditions or diseases entirely different from their original purposes. One of the most celebrated examples of drug repurposing is Viagra, which was originally designed to treat angina. While Viagra did not prove promising for treating that condition, many of the participants in the clinical trials reported a certain intriguing side effect.

There are many other examples of computational discovery that combine multiple pieces of knowledge to reach novel conclusions. From software designed to find undiscovered patterns in the patent literature to the numerous computerized systems devoted to drug repurposing, this approach is growing rapidly. In fact, within mathematics, there is even a whole field of automated theorem proving. Armed with nothing but various axioms and theorems well-known to the mathematics community, as well as a set of rules for how to logically infer one thing from another, a computer simply goes about combining axioms and other theorems in order to prove new ones.

Given enough computational power, these systems can yield quite novel results. Of course, most of the output is rather simple and pedestrian, but they can generate new and interesting provably true mathematical statements as well. One of the earliest examples of these is Automated Mathematician, created by Doug Lenat in the 1970s. This program constructed regularities and equalities, with Lenat even claiming that the Automated Mathematician rediscovered a fundamental unsolved problem (though, sadly, did not solve it) in abstract number theory known as Goldbach’s Conjecture. Goldbach’s Conjecture is the elegant hypothesis that every even number greater than two can be expressed as the sum of two prime numbers. For example, 8 is 5 + 3 and 18 is 7 + 11. This type of program has provided a foundation for other automated proof
systems, such as TheoryMine, briefly mentioned in chapter 2, which names a novel, computationally created and proved theorem after oneself or a friend, for a small price.

TheoryMine was created by a group of researchers in the School of Informatics at the University of Edinburgh. While some people might be excited to simply have something named after themselves and ignore the details, TheoryMine will give you not only the theorem but also a capsule summary of how the theorem was proven. The theorems are all related to the properties of functions and for most people are rather opaque. Nonetheless, it’s great that a mechanism to discover a piece of hidden knowledge is available for a consumer audience.

Other books

The Valentine's Arrangement by Kelsie Leverich
A Fatal Appraisal by J. B. Stanley
The Lost Soldier by Costeloe Diney
The New Kid by Mavis Jukes
The Forgotten Waltz by Anne Enright
Chasing the Dragon by Jackie Pullinger
The Red Queen Dies by Frankie Y. Bailey
Deadly Obsession by Clark, Jaycee
Gently with the Ladies by Alan Hunter