Final Jeopardy (4 page)

Read Final Jeopardy Online

Authors: Stephen Baker

BOOK: Final Jeopardy
3.98Mb size Format: txt, pdf, ePub

A
Jeopardy
machine would also respond to another change in technology: the move toward human language. For most of the first half-century of the computer age, machines specialized in orderly rows of numbers and words. If the buyers in a database were listed in one column, the products in another, and the prices in a third, everything was clear: Computers could run the numbers in a flash. But if one of the customers showed up as “Don” in one transaction and “Donny” in another, the computer viewed them as two people: The two names represented different strings of ones and zeros, and therefore Don ≠ Donny. Computers had no sense of language, much less nicknames. In that way, they were clueless. The world, and all of its complexity, had to be simplified, structured and spoon-fed to these machines.

But consider what hundreds of millions of ordinary people were using computers for by 2004. They were e-mailing and chatting. Some were signing up for new social networks. (Facebook launched in February of that year.) Online humanity was creating mountains of a messy type of digital data: human language. Billions of words were rocketing through networks and piling up in data centers. Those words expressed what millions of people were thinking, desiring, fearing, and scheming. The potential customers of IBM's clients were out there spilling their lives. Entire industries grew by understanding what people were saying and predicting what they might want to do, where they might want to go, and what they were eager to buy. Google was already mining and indexing words on the Web, using them to build a media and advertising empire. Only months earlier, Google had debuted as a publicly traded company, and the new stock was sky-rocketing.

IBM wasn't about to mix it up with Google in the commercial Web. But Big Blue needed state-of-the-art tools to provide its corporate customers with the fastest and most insightful read of the words cascading through their networks. To keep a grip on its gold-plated consulting business, IBM required the very smartest, language-savvy technology—and it needed its customers to know and trust that it had it. It was central to IBM's brand.

So in mid-2005 Horn took up the challenge with a number of his top researchers, including Ferrucci. A twelve-year veteran at the company, Ferrucci managed a handful of research teams, including the five people who were teaching machines to answer simple questions in English. Their discipline was called question-answering. Ferrucci knew the challenges all too well. The machines stumbled in understanding English and appeared to plateau, in competitions sponsored by the U.S. government, at a success rate of about 35 percent.

Ferrucci wasn't a big
Jeopardy
fan, but he was familiar with it enough to appreciate the obstacles involved.
Jeopardy
tested a combination of knowledge, speed, and accuracy, along with game strategy. The show featured three contestants, each with a buzzer. In the course of about twenty minutes, they raced to respond to sixty clues representing a combined value of $54,000. Each one—and this was a
Jeopardy
quirk—was in fact an answer, some far more complex than others. The contestant had to provide the missing question. For example, in an unusual Tournament of Champions game that aired in November 1994, contestants were presented with this $500 clue
1
under the category Furniture: “French term for a what-not, a stand of tiered shelves with slender supports used to display curios.” The host, Alex Trebek, read the clue from the big game board. The moment he finished, a panel around the question lit up setting off the race to buzz. On average, contestants had about four seconds to read and consider the clue before buzzing. The first to buzz was, in effect, placing a bet. The right response—“What is an étagère?”—was worth $500 and gave the contestant the chance to pick again. (“Let's try European Capitals for $200.”) A botched response wiped the same amount from a contestant's score and gave the other two a chance to try. (In this example, no one dared to buzz. Such a clue, uncommon in
Jeopardy,
is known as a “triple-stumper.”)

To compete in
Jeopardy,
a machine not only would need to come up with the answer, posed as a question, within four seconds, but it would also have to gauge its confidence in its response. It would have to know what it knew. “Humans know what they know like
that,
” Ferrucci said later, snapping his fingers. Replicating such confidence in a computer would be tricky. What's more, the computer would have to calculate the risk according to where it stood in the game. If it was far ahead and had only middling confidence on “étagère,” it might make more sense not to buzz. In addition to piling up knowledge, a computer would have to learn to play the game.

Complicating the game strategy were four wild cards. Three of the game's sixty hidden clues were so-called Daily Doubles. In that 1994 game, a contestant named Rachael Schwartz, an attorney from Bedminster, New Jersey, asked for the $400 clue in the Furniture category. Up popped a Daily Double giving her the chance to bet some or all of her money on a furniture-related clue she had yet to see. She wagered $500, a third of her winnings, and was faced with this clue: “This store fixture began in 15th century Europe as a table whose top was marked for measuring.” She missed it, guessing, “What is a cutting table?,” and lost $500. (“What is a counter?” was the correct response.) It was early in the game and didn't have much impact. The three players were all around the $1,000 mark. But later in a game, Ferrucci saw, Daily Doubles gave contestants the means to storm back from far behind. A computer playing the game would require a clever game program to calibrate its bets.

The biggest of the wild cards was Final Jeopardy, the last clue of the game. As in Daily Doubles, contestants could bet all or part of their winnings on a single category. But all three contestants participated—as long as they had positive earnings. Often the game boiled down to betting strategies in Final Jeopardy. Take that 1994 contest, in which the betting took a strange turn. Going into Final Jeopardy, Rachael Schwartz led Kurt Bray, a scientist from Oceanside, California, by a slim margin, $9,200 to $8,600. The category was Historic Names. To lock down a win, she had to assume he would bet everything, reaching $17,200. A bet of $8,001 would give her one dollar more, provided she got it right. But if they both bet big and missed, they might fall to the third-place contestant, Brian Moore, a Ph.D. candidate from Pearland, Texas. In the minute or so that they took to place their bets, the two leaders had to map out the probabilities of a handful of different scenarios. They wrote down their dollar numbers and waited for the clue: “Though he spent most of his life in Europe, he was governor of the Bahamas for most of World War II.”

The second-place player, Bray, was the only one to get it right: “Who was Edward VIII?” Yet he had bet only $500. It was a strange number. It placed him $100 behind the leader, not ahead of her. But the bet kept him beyond the reach of the third-place player. Most players bet at least something on a clue. If Schwartz had wagered and missed, he would win. Indeed, Schwartz missed the clue. She didn't even bother guessing. But she had bet nothing, leaving herself $100 ahead and winning the game.

The betting in Final Jeopardy, Ferrucci saw, might actually play to the strength of a computer. A machine could analyze betting patterns over thousands of games. It could crunch the probabilities and devise optimized strategies in a fraction of a second. “Computers are good at that kind of math,” he said.

It was the rest of
Jeopardy
that appeared daunting. The game featured complex questions and a wide use of puns posing trouble for literal-minded computers. Then there was
Jeopardy
's nearly boundless domain. Smaller and more specific subject areas were easier for computers, because they offered a more manageable set of facts and relationships to master. They provided context. A word like “leak,” for example, had a specific meaning in deep-sea drilling, another in heart surgery, and a third in corporate press relations. A know-it-all computer would have to recognize different contexts to keep the meanings clear. And
Jeopardy
's clues took the concept of a broad domain to a near-ludicrous extreme. The game had an entire category on Famous Understudies. Another was on the oft-forgotten president Rutherford B. Hayes. Worse, from a computer architect's point of view, the game demanded answers within seconds—and penalized players for getting them wrong. A
Jeopardy
machine, just like the humans on the show, would have to store all of its knowledge in its internal memory. (The challenge, IBM figured, wouldn't be nearly as impressive if a bionic player had access to unlimited information on the Web. What's more,
Jeopardy
would be unlikely to accept a Web-surfing contestant, since others didn't have the same privilege.) Beating humans in
Jeopardy,
it seemed, was more than a stretch goal. It appeared impossible and spelled potential disaster for researchers. To embarrass the company on national television—or, more likely, to flame out before even getting there—was no way to manage a career.

Ferrucci's pessimism was also grounded in experience. In annual government competitions, known as TRec (Text Retrieval Conference), his question-answering (Q-A) team developed a system called Piquant. It struggled far below
Jeopardy
levels with a much easier test. In TRec, the competing teams were each given a relatively small “corpus” of about one million documents. They then had to train the machines to answer questions based on the material. (In one version from 2004, several of the questions had to do with Tom Cruise and his ex-wife.)

In answering these questions, the computer, for all its processing power and memory, resembled nothing so much as a student with serious brain damage. An apparently simple question could turn it into knots. In 2005, it was asked: “What is Francis Scott Key best known for?” The first job was to determine which of those words represented the subject of the question, the “entity,” and whether that might be a person, a state, or perhaps an animal or a machine. Each one had different characteristics. “Francis” and “Scott” looked like names. But “Key”? That could be a metal tool to open doors or a mental breakthrough to solve problems. In its hunt, the computer might even spend a millisecond or two puzzling over Key lime pies. Clearing up these doubts might require a visit to the system's “disambiguation” unit, where the answering program consulted a dictionary or looked for contextual clues in the surrounding words. Could “Key” be something the ingenious Francis Scott invented, collected, planted, or stole? Could he have baked it? Probably not. The structure of the question, with no direct object, made it look like the third name of a person. The capital K on Key strengthened that case.

A person confronting that question either knew or did not know that Francis Scott Key wrote the U.S. national anthem, “The Star-Spangled Banner.” But he or she wasted no time searching for the subject and object in the sentence or wondering if it was a last name, a metal tool, or a tangy South Florida dessert.

For the machine, things only got worse. The question lacked a verb, which could disorient the computer. If the question were, “What did Francis Scott Key
write
?” the machine could likely find a passage of text with Key writing something, and that something would point to the answer. The only pointer here—“is known for”—was maddeningly vague. Assuming the computer had access to the Internet (a luxury it wouldn't have on the show), it headed off with nothing but the name. In Wikipedia, it might learn that Key was “an American lawyer, author and amateur poet, from Georgetown, who wrote the words to the United States national anthem, ‘The Star-Spangled Banner.'” For humans, the answer was right there. But the computer, with no verb to guide it, might answer that Key was known as an amateur poet or a lawyer from Georgetown. In the TRec competitions, IBM's Piquant botched two out of every three questions.

All too often, the system failed to understand the question or to put it in the right context. For this, a growing school of Artificial Intelligence argued, systems needed to spend more time in the computer equivalent of infancy, mastering the concepts that humans take for granted: time, space, and the basic laws of cause and effect.

Toddlerhood is a tribulation for computers, because it represents knowledge that is tied to the human experience: the body and the senses. While crawling, we learn about space and physical objects, and we get a sense of time. The toddler reaches for the jar on the table. Moments later pieces of it lie scattered on the floor. What happened between those two states? It fell. Such lessons establish notions of before and after, cause and effect, and the nature of gravity. These experiences, most of them accompanied by a steady stream of human language, set the foundation for practically everything we learn. “You crawl around and bump into things,” said David Gunning, a senior manager at Vulcan Inc., an AI incubator in Seattle. “That's basic research.” It isn't just jars that fall, the toddler notices. Practically everything does. (Certain balloons are exceptions, which seem magical.) The child turns these observations into theory. Unlike computers, humans generalize.

Even the metaphors in our language lead back to the tumbles and accidents seared into our consciousness in our early years. We “fall” for a sales pitch or “fall” in love, and we cringe at hearing “sharp” words or “stinging” rebukes. We process such expressions on such a basic level that they seem closer to feeling than thought (though for humans, unlike computers, the two are intertwined). Over the course of centuries, these metaphors infused language and, consequently, were fundamental to understanding
Jeopardy
clues. Yet to a machine with no body or experience in the physical world, each one was a puzzle.

In some Artificial Intelligence labs, scientists were attempting to transmit these elementary experiences to computers. Sajit Rao, a professor at MIT, was introducing computers equipped with vision to rumpus-room learning, showing them objects moving, falling, obstructing paths, and piling on top of one another. The goal was to establish a conceptual understanding so that eventually computers could draw conclusions from visual observations. What would happen, for example, when vehicles blocked a road?

Other books

Incandescent by River Savage
The Uninvited by William W. Johnstone
Dry Bones by Margaret Mayhew
The Price of Silence by Camilla Trinchieri
Sandman by Morgan Hannah MacDonald
Morning Is Dead by Prunty, Andersen
The Plague Maiden by Kate Ellis
Hot Water by Sparks, Callie