The Information (79 page)

Read The Information Online

Authors: James Gleick

Tags: #Non-Fiction

BOOK: The Information
6.2Mb size Format: txt, pdf, ePub

More and more, the lexicon is in the network now—preserved, even as it changes; accessible and searchable. Likewise, human knowledge soaks into the network, into the cloud. The web sites, the blogs, the search engines and encyclopedias, the analysts of urban legends and the debunkers of the analysts. Everywhere, the true rubs shoulders with the false. No form of digital communication has earned more mockery than the service known as Twitter—banality shrink-wrapped, enforcing triviality by limiting all messages to 140 characters. The cartoonist Garry Trudeau twittered satirically in the guise of an imaginary newsman who could hardly look up from his twittering to gather any news. But then, eyewitness Twitter messages provided emergency information and comfort during terrorist attacks in Mumbai in 2008, and it was Twitter feeds from Tehran that made the Iranian protests visible to the world in 2009. The aphorism is a form with an honorable history. I barely twitter myself, but even this odd medium, microblogging so quirky and confined, has its uses and its enchantment. By 2010 Margaret
Atwood, a master of a longer form, said she had been “sucked into the Twittersphere like Alice down the rabbit hole.”

Is it signaling, like telegraphs? Is it Zen poetry? Is it jokes scribbled on the washroom wall? Is it John Hearts Mary carved on a tree? Let’s just say it’s communication, and communication is something human beings like to do.

 
 

Shortly thereafter, the Library of Congress, having been founded to collect every book, decided to preserve every tweet, too. Possibly undignified, and probably redundant, but you never know. It is human communication.

And the network has learned a few things that no individual could ever know.

It identifies CDs of recorded music by looking at the lengths of their individual tracks and consulting a vast database, formed by accretion over years, by the shared contributions of millions of anonymous users. In 2007 this database revealed something that had eluded distinguished critics and listeners: that more than one hundred recordings released by the late English pianist Joyce Hatto—music by Chopin, Beethoven, Mozart, Liszt, and others—were actually stolen performances by other pianists. MIT established a Center for Collective Intelligence, devoted to finding group wisdom and “harnessing” it. It remains difficult to know when and how much to trust the
wisdom of crowds
—the title of a 2004 book by James Surowiecki, to be distinguished from the
madness of crowds
as chronicled in 1841 by Charles Mackay, who declared that people “go mad in herds, while they recover their senses slowly, and one by one.”

Crowds turn all too quickly into mobs, with their time-honored manifestations: manias, bubbles, lynch mobs, flash mobs, crusades, mass hysteria, herd mentality, goose-stepping, conformity, groupthink—all potentially magnified by network effects and studied under the rubric of information cascades. Collective judgment has appealing possibilities;
collective self-deception and collective evil have already left a cataclysmic record. But knowledge in the network is different from group decision making based on copying and parroting. It seems to develop by accretion; it can give full weight to quirks and exceptions; the challenge is to recognize it and gain access to it. In 2008, Google created an early warning system for regional flu trends based on data no firmer than the incidence of Web searches for the word
flu;
the system apparently discovered outbreaks a week sooner than the Centers for Disease Control and Prevention. This was Google’s way: it approached classic hard problems of artificial intelligence—machine translation and voice recognition—not with human experts, not with dictionaries and linguists, but with its voracious data mining of trillions of words in more than three hundred languages. For that matter, its initial approach to searching the Internet relied on the harnessing of collective knowledge.

Here is how the state of search looked in 1994. Nicholson Baker—in a later decade a Wikipedia obsessive; back then the world’s leading advocate for the preservation of card catalogues, old newspapers, and other apparently obsolete paper—sat at a terminal in a University of California library and typed,
BROWSE SU[BJECT] CENSORSHIP
.

He received an error message,

LONG SEARCH
: Your search consists of one or more very common words, which will retrieve over 800 headings and take a long time to complete,

 
 

and a knuckle rapping:

Long searches slow the system down for everyone on the catalog and often do not produce useful results. Please type
HELP
or see a reference librarian for assistance.

 
 

All too typical. Baker mastered the syntax needed for Boolean searches with complexes of
AND
s and
OR
s and
NOT
s, to little avail. He cited
research on screen fatigue and search failure and information overload and admired a theory that electronic catalogues were “in effect, conducting a program of ‘aversive operant conditioning’ ” against online search.

Here is how the state of search looked two years later, in 1996. The volume of Internet traffic had grown by a factor of ten each year, from 20 terabytes a month worldwide in 1994 to 200 terabytes a month in 1995, to 2 petabytes in 1996. Software engineers at the Digital Equipment Corporation’s research laboratory in Palo Alto, California, had just opened to the public a new kind of search engine, named AltaVista, continually building and revising an index to every page it could find on the Internet—at that point, tens of millions of them. A search for the phrase
truth universally acknowledged
and the name
Darcy
produced four thousand matches. Among them:

  • The complete if not reliable text of
    Pride and Prejudice
    , in several versions, stored on computers in Japan, Sweden, and elsewhere, downloadable free or, in one case, for a fee of $2.25.
  • More than one hundred answers to the question, “Why did the chicken cross the road?” including “Jane Austen: Because it is a truth universally acknowledged that a single chicken, being possessed of a good fortune and presented with a good road, must be desirous of crossing.”
  • The statement of purpose of the
    Princeton Pacific Asia Review:
    “The strategic importance of the Asia Pacific is a truth universally acknowledged …”
  • An article about barbecue from the Vegetarian Society UK: “It is a truth universally acknowledged among meat-eaters that …”
  • The home page of Kevin Darcy, Ireland. The home page of Darcy Cremer, Wisconsin. The home page and boating pictures of Darcy Morse. The vital statistics of Tim Darcy, Australian footballer. The résumé of Darcy Hughes, a fourteen-year-old yard worker and babysitter in British Columbia.

Trivia did not daunt the compilers of this ever-evolving index. They were acutely aware of the difference between making a library catalogue—its target fixed, known, and finite—and searching a world of information without boundaries or limits. They thought they were onto something grand. “We have a lexicon of the current language of the world,”

said the project manager, Allan Jennings.

Then came Google. Brin and Page moved their fledgling company from their Stanford dorm rooms into offices in 1998. Their idea was that cyberspace possessed a form of self-knowledge, inherent in the links from one page to another, and that a search engine could exploit this knowledge. As other scientists had done before, they visualized the Internet as a graph, with nodes and links: by early 1998, 150 million nodes joined by almost 2 billion links. They considered each link as an expression of value—a recommendation. And they recognized that all links are not equal. They invented a recursive way of reckoning value: the rank of a page depends on the value of its incoming links; the value of a link depends on the rank of its containing page. Not only did they invent it, they published it. Letting the Internet know how Google worked did not hurt Google’s ability to leverage the Internet’s knowledge.

At the same time, the rise of this network of all networks was inspiring new theoretical work on the topology of interconnectedness in very large systems. The science of networks had many origins and evolved along many paths, from pure mathematics to sociology, but it crystallized in the summer of 1998, with the publication of a letter to
Nature
from Duncan Watts and Steven Strogatz. The letter had three things that combined to make it a sensation: a vivid catchphrase, a nice result, and a surprising assortment of applications. It helped that one of the applications was All the World’s People. The catchphrase was
small world
. When two strangers discover that they have a mutual friend—an unexpected connection—they may say, “It’s a small world,” and it was in this sense that Watts and Strogatz talked about small-world networks.

The defining quality of a small-world network is the one unforgettably
captured by John Guare in his 1990 play,
Six Degrees of Separation
. The canonical explanation is this:

I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation. Between us and everyone else on this planet. The President of the United States. A gondolier in Venice. Fill in the names.

 
 

The idea can be traced back to a 1967 social-networking experiment by the Harvard psychologist Stanley Milgram and, even further, to a 1929 short story by a Hungarian writer, Frigyes Karinthy, titled “
Láncszemek
”—
Chains
.

Watts and Strogatz took it seriously: it seems to be true, and it is counterintuitive, because in the kinds of networks they studied, nodes tended to be highly clustered. They are cliquish. You may know many people, but they tend to be your neighbors—in a social space, if not literally—and they tend to know mostly the same people. In the real world, clustering is ubiquitous in complex networks: neurons in the brain, epidemics of infectious disease, electric power grids, fractures and channels in oil-bearing rock. Clustering alone means fragmentation: the oil does not flow, the epidemics sputter out. Faraway strangers remain estranged.

But some nodes may have distant links, and some nodes may have an exceptional degree of connectivity. What Watts and Strogatz discovered in their mathematical models is that it takes astonishingly few of these exceptions—just a few distant links, even in a tightly clustered network—to collapse the average separation to almost nothing and create a small world.

One of their test cases was a global epidemic: “Infectious diseases are predicted to spread much more easily and quickly in a small world; the alarming and less obvious point is how few short cuts are needed to make the world small.”

A few sexually active flight attendants might be enough.

In cyberspace, almost everything lies in the shadows. Almost everything is connected, too, and the connectedness comes from a relatively
few nodes, especially well linked or especially well trusted. However, it is one thing to prove that every node is close to every other node; that does not provide a way of finding the path between them. If the gondolier in Venice cannot find his way to the president of the United States, the mathematical existence of their connection may be small comfort. John Guare understood this, too; the next part of his
Six Degrees of Separation
explanation is less often quoted:

I find that A) tremendously comforting that we’re so close, and B) like Chinese water torture that we’re so close. Because you have to find the right six people to make the connection.

 
 

There is not necessarily an algorithm for that.

The network has a structure, and that structure stands upon a paradox. Everything is close, and everything is far, at the same time. This is why cyberspace can feel not just crowded but lonely. You can drop a stone into a well and never hear a splash.

No deus ex machina waits in the wings; no man behind the curtain. We have no Maxwell’s demon to help us filter and search. “We want the Demon, you see,” wrote Stanislaw Lem, “to extract from the dance of atoms only information that is genuine, like mathematical theorems, fashion magazines, blueprints, historical chronicles, or a recipe for ion crumpets, or how to clean and iron a suit of asbestos, and poetry too, and scientific advice, and almanacs, and calendars, and secret documents, and everything that ever appeared in any newspaper in the Universe, and telephone books of the future.”

As ever, it is the choice that
informs
us (in the original sense of that word). Selecting the genuine takes work; then forgetting takes even more work. This is the curse of omniscience: the answer to any question may arrive at the fingertips—via Google or Wikipedia or IMDb or YouTube or Epicurious or the National DNA
Database or any of their natural heirs and successors—and still we wonder what we know.

We are all patrons of the Library of Babel now, and we are the librarians, too. We veer from elation to dismay and back. “When it was proclaimed that the Library contained all books,” Borges tells us, “the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. There was no personal or world problem whose eloquent solution did not exist in some hexagon. The universe was justified.”

Then come the lamentations. What good are the precious books that cannot be found? What good is complete knowledge, in its immobile perfection? Borges worries: “The certitude that everything has been written negates us or turns us into phantoms.” To which, John Donne had replied long before, “He that desires to print a book, should much more desire, to be a book.”

Other books

Batteries Not Required by Linda Lael Miller
Orion Shall Rise by Poul Anderson
Romance for Matthew by Fornataro, Nancy
The Sound of Glass by Karen White
The Consignment by Grant Sutherland
The Golden Cage by J.D. Oswald
Bog Child by Siobhan Dowd
Wedding Cookies by George Edward Stanley
Pious Deception by Susan Dunlap