Final Jeopardy (13 page)

Read Final Jeopardy Online

Authors: Stephen Baker

BOOK: Final Jeopardy
6.49Mb size Format: txt, pdf, ePub

The researchers were swimming in examples of misunderstandings and wrong turns. Blue J, after all, was failing on half of the clues. But which ones represented larger patterns? Fixing those might enhance its analysis in an entire category. One South America clue, for example, appeared to signal a glitch on Blue J's part in analyzing geography—an important category in
Jeopardy
. The clue asked for the country that shared the longest border with Chile. Blue J came back with the wrong answer: What is Bolivia? The correct response (What is Argentina?) was its second choice.

Analyzing the clue, researchers saw that Blue J had received conflicting answers from two algorithms. The one specializing in geography had come back with the right answer, Argentina, whose 5,308-kilometer border with Chile dwarfed the 861-kilometer Chilean-Bolivian frontier. But another algorithm had counted references to these countries and their borders and found a lot more talk about the Bolivian stretch. (Chile and Bolivia have been engaged in a border dispute since the 1870s, generating a steady stream of news coverage.) Lacking any other context, this single-minded algorithm suggested Bolivia—and Blue J unwisely trusted it. “The computer was paying more attention to popularity than geography,” Ferrucci said. Researchers went on to tinker with the ratios underlying Blue J's judgment. They instructed it to give more weight to the geography in that type of question and a bit less to popularity. Then they tested the system on a large batch of similar geography clues. Blue J's performance improved. They then ran it on a group of random clues to find out if the adjustment affected Blue J's performance elsewhere, perhaps turning correct answers into mistakes. That happened all too often. But this time the change helped. Blue J's performance inched ahead another tiny fraction of a percent.

The
Jeopardy
clues, nearly all of them from the J! Archive Web site, were the test bed for this stage of Blue J's education. Eric Brown, Ferrucci's top lieutenant, oversaw this cache along with Chu-Carroll. Brown was serious and circumspect. He got his doctorate at the University of Massachusetts and graduated, in 1996, at the dawn of the dot-com boom. Citing family obligations, he turned down a job offer from Infoseek, one of the early search engines. Two years later, the Walt Disney Company paid $430 million for 42 percent of Infoseek, turning many of the early employees—including the one who grabbed the job Brown had been offered—into multimillionaires. “It's a sad story for me,” Brown said. “I ran into him a few years later at a conference. He was retired.”

From the very beginning, Brown kept tight control of the
Jeopardy
data. He distributed two thousand clues at a time, which the team used to train Blue J. The risk they faced, as in any statistical analysis, was that they'd fine-tune the machine too precisely to each batch of questions. This tendency to model too closely to a training set is known as overfitting, and it's a serious problem.

Anyone who has ever studied a foreign language knows all about it. Students inevitably overfit to the French or Spanish or Mandarin that their teacher speaks. They adjust to her rhythms and syntax and come to associate that single voice with the language itself. A trip to Paris or Beijing often brings a rude awakening. In Blue J's education, each training set was a single teacher. When the computer started to score well on a training set, the researchers would test it on
Jeopardy
clues it had never seen before. This was a blind set of data, a few thousand clues that no one but Brown had seen. Each time Blue J ventured from its comfortable clues into an unfamiliar set of data, its results would drop about 5 percent. But still, its overall scores were rising. Brown would release another training set, and the process would start over.

The broader question, naturally, was whether the
Jeopardy
challenge itself was one giant exercise in overfitting.
Jeopardy
, in a sense, was a single training set of 185,000 clues, including general knowledge and a mix (that Ferrucci's team quickly quantified) of puzzles, riddles, and the like. If Blue J eventually mastered the game and even defeated Ken Jennings and Brad Rutter in its televised showdown, would its expertise be too specific, or esoteric, for the broader world of business? Would it be flummoxed once it ventured outside its familiar grid of thirty clues? After all,
Jeopardy
champions were hardly famous for running corporations, mastering global diplomacy, or even managing large research projects. They tended to be everyday people—real estate agents, teachers, software developers, librarians—all with one section of their mind specially adapted—or possibly overfitted—to a TV quiz show.

David Ferrucci spent his days swimming in statistics. They defined every aspect of the
Jeopardy
project. Blue J's analysis of data was statistical. Its confidence algorithms and learning programs were fed entirely by statistics. Its choice of words and its game strategy were guided by similar analysis, all statistical. Blue J's climb up the Jennings Arc was a curve defined by statistics, and when it got into sparring sessions with humans, sometime in 2009, its record would be calculated the same way. The Final Match was the rare exception—a fact that haunted Ferrucci from the very start. Blue J's fortunes would be defined more by chance than probability. One game, after all, was a minuscule test set, statistically meaningless. A bit of bad luck on a couple of Daily Doubles, and Blue J could lose—even if statistics demonstrated that it
usually
won.

Ferrucci was constantly analyzing the statistical methodology of teaching and testing the bionic player. One day in the spring of 2008, he came up with a question no one had asked before. Was there any variation, from year to year, in the
Jeopardy
clues? He asked Eric Brown, the guardian of Blue J's training set. Did Blue J fare better against the clues from some years than others?

It was odd, looking back, that such a simple question had gone unasked for so long. It could be important. Even a change in the clue writers or new directions from the producer could usher in new styles or subject matter. By opening up the format to more popular culture in 1997, Harry Friedman had already demonstrated that
Jeopardy
, unlike chess, was a game that changed with the times. Did it evolve in a predictable way? If so, Blue J had to be ready.

Brown's team proceeded to analyze Blue J's performance against the clues, year by year. They were stunned to see that the machine's scores plummeted when answering clues from 2003 and remained at that lower level. It was as if the machine got dumber, by about 10 percent. As Blue J answered the newer questions, its precision stayed constant. In other words, it didn't make more mistakes. But with a lower level of confidence, it didn't buzz as often. Blue J was more confused.

The IBM team called this shift “climate change.” For weeks, researchers pored over
Jeopardy
data, trying to figure out why in season 20, from September of 2003 to the following July, the questions suddenly became harder for Blue J. Was it more puzzles or puns? They couldn't tell.

That twentieth season was the one in which Ken Jennings began his remarkable run. Was
Jeopardy
toughening the clues for Jennings and unwittingly making the game harder for Blue J? That seemed unlikely, especially since it would be difficult to make the game harder for the omniscient Jennings without also ratcheting it up for his competitors. Ferrucci and his team asked Friedman about the change. He said he didn't know—and added that at this point IBM certainly knew more about
Jeopardy
clues than he did.

Climate change meant that as Blue J prepared for its first matches with human
Jeopardy
champs—so-called sparring sessions—two-thirds of its training set was too easy. It was like a student who crams for twelfth-grade finals only to see, late in the game, that he's been consulting eleventh-grade textbooks. From Blue J's perspective, the game had just gotten considerably harder.

5. Watson's Face

IN THE FALL
of 1992, a young painter named Joshua Davis moved from Colorado to New York City and enrolled at the prestigious Pratt Institute. After a year, he switched from painting to illustration, where there were better career opportunities. “I thought, ‘I'll still paint. It'll just be for the Man,'” Davis said. But when he sent his work to two book publishers, hoping to line up illustration contracts for children's books, the response was essentially, as he put it, “‘Thanks but no thanks, and like, who the fuck are you?'”

Davis didn't take it too hard. His self-esteem was strong enough to withstand a knock or two. A bit later a friend at school steered him toward the digital world. “He said, ‘Oh, don't worry, man, there's this whole Internet thing now. Like books are dead.'” Davis said he was “totally naive” at that point. “I said, ‘Cool. Print's dead. Fantastic!'” He promptly bought an old computer, but it lacked an operating system. So he went to a bookstore and bought one last artifact from the printed world: a manual for the new open-source system called Linux. A diskette he found at the back of the book contained the software. “I was like, ‘Score!'” he said.

Davis didn't know he was about to tackle what he calls the “world's hardest operating system.” But as he taught himself about user interface design, programming, and video graphics, he had an epiphany. He wasn't going to use computers simply to create designs more quickly or to reach more people. The technology itself, following his instructions, would generate the art. “At the time I thought, ‘The Internet is my new canvas,'” he said.

His first corporate job was for Microsoft. He designed visual applications for Internet Explorer 4, which debuted in 1997. For the next few years, he became a leader in the new field of generative art, using programs to combine data into colors and patterns that could morph into countless variations. For this he harnessed movements from nature, such as wind, flowing water, and swarming birds and insects. He even turned his body into an evolving canvas. He had his entire left arm tattooed with the twenty glyphs of the Mayan calendar, the swirling designs running up his right arm depicted Japanese wind, and his back carried images of water. Fire, he said, would eventually cover his chest. He had birds tattooed on his neck, one of them dedicated to his daughter, Kelly Ann.

Davis built a thriving studio, with offices in Manhattan and Chicago, and a long list of clients, from Volkswagen and Motorola to rap luminaries Sean “Puff Daddy” Combs and Kanye West. He eventually moved from the city to a hundred-year-old house with a barn in Mineola, on Long Island. As his success grew, he gave more thought to where his work fit in the history of art. In 2008, for a lecture series on dynamic abstraction, he focused on Jackson Pollock, the abstract artist famous for dripping paint on canvases from a stepladder. “Here's a guy who says, ‘I'm going to paint, but I'm going to use gesture.'” Davis waved his arms to illustrate the movement. “Wherever the paint goes, the paint goes.” Not one to sell himself short, he said he felt like an extension of Pollock. “I'm creating systems where I establish the paints, the boundaries, and the colors. But where it goes is where it goes. It's like controlled chaos.”

As Davis learned more about Pollock, his feelings of kinship only grew. He read that the other artist had also left the city, moved to Long Island, and worked in a barn. “It was, like, sweet!” Davis said. “How did
that
work out?”

It was around that time, in October 2008, that Davis got a call from an art director at Ogilvy & Mather, the international advertising agency. IBM, he learned, was building a computer to take on human champions in
Jeopardy
. How would he like to create the machine's face?

During the first year of Blue J's development, few at IBM thought much about the computer's physical presence or its branding. A pretty face would be irrelevant if the team couldn't come up with a workable brain. But by late summer of 2008, Ferrucci's team was getting close. One August day, Harry Friedman and the show's supervising producer, a former
Jeopardy
champion named Rocky Schmidt, visited the Yorktown labs for their first look at the bionic player.

As the group gathered in one of the windowless conference rooms at the Yorktown lab, Ferrucci walked them through the computer's cognitive process, explaining how it came up with answers and why, on occasion, it flubbed them so badly. He explained that the hardware—what would become Watson's body—wasn't yet ready to deliver timely answers. But the team had led the computer through a game of
Jeopardy,
had recorded its answers, and then created a simulation of the game by loading the answers into a laptop. With that, Friedman and Schmidt watched the new contestant in action. Friedman later said that he had been “blown away” by the computer's performance.

The conversation, according to Noah Syken, a media manager at IBM, quickly turned to logistics and branding. If the computer required the equivalent of a roaring data center to play the game, where would all that machinery fit on the
Jeopardy
set? And how about all the noise and heat it would generate? One possibility might be to set up its hulking body on the
Wheel of Fortune
set, next door, and run the answers to the podium. But that raised a bigger question: What would viewers see at that podium? No one had a clue.

The following month, as Lehman Brothers imploded, car companies crashed, and the world's financial system appeared to teeter on the verge of collapse, IBM's branding and marketing team worked to develop the personality and message of the
Jeopardy
-playing machine. It would need a face of some sort and a voice. And it had to have a name.

An entire corporate identity unit at IBM specialized in naming products and services. A generation earlier, when the company still sold machines to consumers, some of the names this division dreamed up became iconic. “PC” quickly became a broad term for personal computers (at least those that weren't made by Apple). ThinkPad was the marquee brand for top-of-the-line business laptops. And for a few decades before the PC, the Selectric, the electric typewriter with a single rotating type ball (which could “erase” typos with space-age precision) epitomized quality for anyone creating documents. With IBM's turn toward services, the company risked losing its contact with the popular mind—and its identity as a hotbed of innovation.

Other books

Leaving Before the Rains Come by Fuller, Alexandra
Return of the Phoenix - 01 by Heath Stallcup
The Colour of Tea by Tunnicliffe, Hannah
Baby It's Cold Outside by Fox, Addison
Crushed by Dawn Rae Miller
Bad Monkey by Carl Hiaasen
Dare You To by Katie McGarry