Final Jeopardy (12 page)

Read Final Jeopardy Online

Authors: Stephen Baker

BOOK: Final Jeopardy
3.91Mb size Format: txt, pdf, ePub

But such a machine, if it was ever built, would not be ready for Ferrucci. As he saw it, his team had to produce a functional
Jeopardy
machine within two years. If Harry Friedman didn't see a viable machine by 2009, he would never green-light the man-machine match for late 2010 or early 2011. This deadline compelled Ferrucci and his team to assemble their machine with existing technology—the familiar silicon-based semiconductors, servers whirring through billions of calculations and following instructions from lots of software programs that already existed. In its guts, Blue J would not be so different from the ThinkPad Ferrucci lugged from one meeting to the next. Its magic would have to come from its massive scale, inspired design, and carefully tuned algorithms. In other words, if Blue J became a great
Jeopardy
player, it would be less a breakthrough in cognitive science than a triumph of engineering.

Every computing technology Ferrucci had ever touched, from the first computer he saw at Iona to the Brutus machine that spit out story plots, had a clueless side to it. Such machines could follow orders and carry out surprisingly complex jobs. But they were nowhere close to humans. The same was true of expert systems and neural networks: smart in one area, dumb in every other. And it was also the case with the
Jeopardy
algorithms his team was piecing together in the Hawthorne labs. These sets of finely honed computer commands each had a specialty, whether it was hunting down synonyms, parsing the syntax of a clue, or counting the most common words in a document. Beyond these meticulously programmed tasks, each was helpless.

So how would Blue J concoct broader intelligence—or at least enough of it to win at
Jeopardy
? Ferrucci considered the human brain. “If I ask you ‘36 plus 43,' a part of you goes, ‘Oh, I'll send that question over to the part of my brain that deals with math,'” he said. “And if I ask you a question about literature, you don't stay in the math part of your brain. You work on that stuff somewhere else.” Now this may be the roughest approximation of how the brain works, but for Ferrucci's purposes, it didn't matter. He knew that the brain had different specialties, that people instinctively skipped from one to another, and that Blue J would have to do the same thing.

Unlike a human, however, Blue J wouldn't know where to start. So with its vast resources, it would start everywhere. Instead of reading a clue and assigning the sleuthing work to specialist algorithms, Blue J would unleash scores of them on a hunt, then see which one came up with the best answer. The algorithms inside Blue J—each following a different set of marching orders—would bring in competing results. This process, a lot less efficient than the human brain, would require an enormous complex of two thousand processors, each handling a different piece of the job.

To see how these algorithms carried out their hunt, consider one of thousands of clues the fledgling system grappled with. In the category Diplomatic Relations, it read: “Of the 4 countries the United States does not have diplomatic relations with, the one that's farthest north.”

In the first wave of algorithms to assess the clue was a cluster that specialized in grammar. They diagrammed the sentence, much the way a grade school teacher once did, identifying the nouns, verbs, direct objects, and prepositional phrases. This analysis helped to resolve doubts about specific words. The “United States,” in this clue, referred to the country, not the army, the economy, or the Olympic basketball team. Then they pieced together interpretations of the clue. Complicated clues, like this one, might lead to different readings, one more complex, the other simpler, perhaps based solely on words in the text. This duplication was wasteful, but waste was at the heart of the Blue J strategy. Duplicating or quadrupling its effort, or multiplying it by 100, was one way it would compensate for its cognitive shortcomings—and play to its advantage in processing speed. Unlike humans, who instantly understand a question and pursue a single answer, the computer might hedge, launching simultaneous searches for a handful of different possibilities. In this way and many others, Blue J would battle the efficient human mind with spectacular, flamboyant inefficiency. “Massive redundancy” was how Ferrucci described it. Transistors were cheap and plentiful. Blue J would put them to use.

While the machine's grammar-savvy algorithms were dissecting the clue, one of them searched for its LAT. In this clue about diplomacy, “the one” evidently referred to a country. If this was the case, the universe of Blue J's possible answers was reduced to a mere 194, the number of countries in the world. (This, of course, assumed that “country” didn't refer to “Marlboro Country” or “wine country” or “country music.” Blue J had to remain flexible, because these types of exception often occurred.)

Once the clue was parsed into a question the machine could understand, the hunt commenced. Each expert algorithm went burrowing through Blue J's trove of data in search of the answer. The genetic algorithm, following instructions developed for decoding the genome, looked to match strings of words in the clue with similar strings elsewhere, maybe in some stored Wikipedia entry or in articles about diplomacy, the United States, or northern climes. One of the linguists worked on rhyming key words in the clue or finding synonyms. Another algorithm used a Google-like approach and focused on documents that matched the greatest number of key words in the clue, giving special attention to the ones that surfaced the most often.

While they worked, software within Blue J would compare the clue to thousands of others it had encountered. What kind was it—a puzzle, a limerick, a historical factoid? Blue J was learning to recognize more than fifty types of questions, and it was constructing the statistical record of each algorithm for each type of question. This would guide it in evaluating the results when they came back. If the clue turned out to be an anagram, for example, the algorithm that rearranged the letters of words or phrases would be the most trusted source. But that same algorithm would produce gibberish for most other clues.

What kind of clue was this one on Diplomatic Relations? It appeared to require two independent analyses. First, the computer had to come up with the four countries with which the United States had no diplomatic ties. Then it had to figure out which of them was the farthest north. A group of Blue J's programmers had recently developed an algorithm focused on these so-called nested clues, in which one answer lay inside another. This may sound obscure, but humans ask this type of question all the time. If someone wonders about “cheap pizza joints close to campus,” the person answering has to carry out two mental searches, one for cheap pizza joints and another for those nearby. Blue J's “nested decomposition” led the computer through a similar process. It broke the clues into two questions, pursued two hunts for answers, and then pieced them together. The new algorithm was proving useful in
Jeopardy
. One or two of these combination questions came up in nearly every game. They were especially common in the all-important Final Jeopardy, which usually featured more complex clues.

It took Blue J almost an hour for its algorithms to churn through the data and return with their candidate answers. Most were garbage. There were failed anagrams of country names and laughable attempts to rhyme “north” and “diplomatic.” Some suggested the names of documents or titles of articles that had strings of the same words. But the nested algorithm followed the right approach. It found the four countries on the outs with the United States (Bhutan, Cuba, Iran, and North Korea), checked their geographical coordinates, and came up with the answer: “What is North Korea?”

At this point, Blue J had the right answer. It had passed the binary recall test. But it did not yet know that North Korea was correct, nor that it even merited enough confidence for a bet. For this, it needed loads of additional analysis. Since the candidate answer came from an algorithm with a strong record on nested clues, it started out with higher than average confidence in that answer. The machine proceeded to check how many of the answers matched the question type “country.” After ascertaining that North Korea appeared to be a country, confidence in “What is North Korea?” increased. For a further test, it placed “North Korea” into a simple sentence generated from the clue: “North Korea has no diplomatic relations with the United States.” Then it would see if similar sentences showed up in its data trove. If so, confidence climbed higher.

In the end, it chose North Korea as the answer to bet on. In a real game, Blue J would have hit the buzzer. But being a student, it simply moved on to the next test.

The summer of 2007 turned into fall. Real estate prices edged down in hot spots like Las Vegas and San Diego, signaling the end of a housing boom. Senators Obama and Clinton seemed to campaign endlessly in Iowa. The Red Sox marched toward their second World Series crown of the decade, and Blue J grew smarter.

But Ferrucci noted a disturbing trend among his own team: It was slowing down. When technical issues came up, they often required eight or ten busy people to solve them. If a critical algorithm person or a member of the hardware team was missing, the others had to wait a day or two, or three, by which point someone else was out of pocket. Ferrucci worried. Even though the holidays were still a few months away and they had all of 2008 to keep working, his boss, a manager named Arthur Ciccolo, never tired of telling him that the clock was ticking. It was, and Ferrucci—thinking very much like a computer engineer—viewed his own team as an inefficient system, one plagued with low bandwidth and high latency. As team members booked meeting rooms and left phone messages, vital information was marooned for days at a time, even weeks, in their own heads.

Computer architects faced with bandwidth and latency issues often place their machines in tight clusters. This reduces the distance that information has to travel and speeds up computation. Ferrucci decided to take the same approach with his team. He would cluster them. He found an empty lab at Hawthorne and invited his people to work there. He called it the War Room.

At first it looked more like a closet, an increasingly cluttered one. The single oval table in the room was missing legs. So the researchers piggybacked it on smaller tables. It had a tilt and a persistent wobble, no matter how many scraps of cardboard they jammed under the legs. There weren't enough chairs, so they brought in a few from the nearby cafeteria. Attendance in the War Room was not mandatory but an initial crew, recognizing the same bandwidth problems, took to it right away. With time, others who stayed in their offices started to feel out of the loop. They fetched chairs and started working at the same oval table. The War Room was where decisions were being made.

For high-tech professionals, it all seemed terribly old-fashioned. People were standing up, physically, when they had a problem and walking over to colleagues or, if they were close enough, rolling over on their chairs. Nonetheless, the pace of their work quickened. It was not only the good ideas that were traveling faster; bad ones were, too. This was a hidden benefit of higher bandwidth. With more information flowing, people could steer colleagues away from the dead ends and time drains they'd already encountered. Latency fell. “Before, it was like we were running in quicksand,” said David Gondek, a new Ph.D. from Brown who headed up machine learning. Like many of the others on the team, Gondek started using his old office as a place to keep stuff. It became, in effect, his closet.

It was a few weeks after Ferrucci set up the War Room that the company safety inspector dropped by. He saw monitors propped on books and ethernet cables snaking along the floor. “The table was wobbly. It was a nightmare,” Ferrucci said. The inspector told them to clear out. Ferrucci started looking for a bigger room and quickly realized his team members expected cubicles in the larger space. He told them no, he didn't want them to have the “illusion of returning to a private office.” He found a much larger room on the third floor. Someone had left a surfboard there. Ferrucci's team propped it at the entrance and sat a tiny toy bird, a bluebird, on top of it. It was the closest specimen they could find to a blue jay.

A war room, of course, was hardly unique to Ferrucci's team. Financial trading floors and newsrooms at big newspapers had been using war rooms for decades. All of these operations involved piecing together networks of information. Each person, ideally, fed the others. But for IBM, the parallel between the
Jeopardy
team and what it was building was particularly striking. The computer had areas of expertise, some in knowledge, others in language. It had an electrical system to transmit information and a cognitive center to interpret it and to make decisions. Each member of Ferrucci's team represented one (or more) of these specialties. In theory, each one could lay claim to a certain patch of transistors in the thinking machine. So in pushing the team into a single room, Ferrucci was optimizing the human brain that was building the electronic one.

By early 2008, Blue J's scores were rising. On the Jennings Arc posted on the wall of the War Room, it was climbing toward the champion—but was still 30 percent behind him. If it continued the pace of the last six months, it might reach Jennings by mid-2008 or even earlier. But that wasn't the way things worked. Early on, Ferrucci said, the team had taught Blue J the easy lessons. “In those first months, we could put in a new algorithm and see its performance jump by two or three percent,” he said. But with the easy fixes in, the advances would be smaller, measured in tenths of a percentage.

The answer was to focus on Blue J's mistakes. Each one pointed to a gap in its knowledge or a misunderstanding: something to fix. In that sense, each mistake represented an opportunity. The IBM team, working in 2007 with Eric Nyberg, a computer scientist at Carnegie Mellon, had designed Blue J's architecture for what they called blame detection. The machine monitored each stage of its long and intricate problem-solving process. Every action generated data, lots of it. Analysts could carry out detailed studies of the pathways and performance of algorithms on each question. They could review each document the computer consulted and the conclusions it drew from it. In short, the team could zero in on each decision that led to a mistake and use that information to improve Blue J's performance.

Other books

Desperate by Daniel Palmer
A Touch of Spring by Hunter, Evie
Destroy All Cars by Blake Nelson
Big Girls Rock 1 by Danielle Houston
Before the Moon Rises by Catherine Bybee
Once a Rebel by Sheri WhiteFeather