Authors: Stephen Baker
The IBM researchers could, of course, teach Watson to anticipate the buzz. But it would be a monumental task. It might require outfitting Watson with ears. Then they'd have to study the patterns of Alex Trebek's voice, the time it took him to read clues of differing lengths, the average gap in milliseconds between his last syllable and the activation of the light. It would require the efforts of an entire team and exhaustive testing during the remaining sparring sessions, made more difficult because Trebek, raised in Canada, had different voice patterns than his IBM fill-in, Todd Crain, from Illinois. It would amount to an entire research projectâwhich would likely be useless to IBM outside the narrow confines of a specific game show. Ferrucci wouldn't even consider it.
Loughran thought Ferrucci and Friedman could iron out many of these points with a one-on-one conversation. “Why don't you pick up the phone and call Harry?” he said. “You negotiate. If they get the finger, you get rid of the anticipatory buzzing.”
Ferrucci shrugged. His worries ran deeper than the finger and the buzzer. He was far more concerned about the clues Watson would face. Unlike chess,
Jeopardy
was a game created, week by week, by humans. A team of ten writers formulated the clues and the categories. If they knew that their clues would be used in the man-machine match, mightn't they be tempted, perhaps unconsciously, to test the machine? “As soon as you create a situation in which the human writer, the person casting the questions, knows there's a computer behind the curtain, it's all over. It's not
Jeopardy
anymore,” Ferrucci said. Instead of a game for humans in which a computer participates, it's a test of the computer's mastery of human skills. Would a pun trip up the computer? How about a phrase in French? “Then it's a Turing test,” he said. “We're not doing the Turing test!”
To be fair, the
Jeopardy
executives understood this issue and were committed to avoiding the problem. The writers would be kept in the dark. They wouldn't know which of their clues and categories would be used in the Watson showdown. According to the preliminary plans, they would be writing clues for fifteen Tournament of Champions matches, and Watson would be playing only one of them. But Ferrucci didn't think this was sufficient. One way or another they would be influenced by it, or at least they
might
be. From a scientific standpoint, there was no distinction between the existence and the possibility of bias. Either way, the results were compromised. Fifteen games, he said, was not a big enough set. “That's not statistically significant.”
Epstein said that claims of bias always came up in man-machine contests, because humans always changed their behavior when faced with a machine while other humans were busy tweaking the machine. “Even in the Deep Blue chess game,” he said, “Kasparov was complaining bitterly that the IBM team cheated.” But how could a machine cheat in chess? “Nobody's writing questions,” he said.
The concern in the chess match, Ferrucci said, was that the humans responded to Kasparov's tactics and retuned the computer. Kasparov had already adjusted to the computer's strategy and then found himself facing another one. “He was very offended by that,” Ferrucci said.
“So it was unfair for the machine to change its strategy,” Epstein asked, “but OK for the man to change his?”
Throughout the meal, they discussed the nature of competitions between people and machines. They weren't new, by any stretch. But earlier in the process, they had seemed more theoretical. Now, with
Jeopardy
laying down the law, theory was colliding with reality.
“I have a question for you,” Epstein said at one point. “Has anyone discussed what risks
Jeopardy
has in this?”
“It raises interesting issues,” Ferrucci said. “One of them is, do they have a horse in the race? Do they want something in particular to happen? We don't control anything but our machine,” he went on. “We want our machine to win. This is not a mystery.
Jeopardy
holds a different set of cards.”
“They want it to be entertaining,” Loughran said.
“But what does it mean for the show for the computer to win or lose?” Ferrucci asked. “What does it mean for the show if the human, let's say, clobbers the computer? These are open questions. They're in a tough spot, because on the one hand they have to maintain the [show's] integrity. But at the same time, there's a perception issue, and people might think: âGee, would
Jeopardy
be obsolete if the computer won? Would this change the game?'”
“No way,” Loughran said.
“You don't think so,” Ferrucci said, “but they have to be asking the question.” He paused and ate quietly for a few moments. This marketing side of the project, which made it so exciting, was also causing stress. He was spending more and more time dealing with the
Jeopardy
team and the PR machine and less time in the lab. He was having trouble sleeping. He turned back to Loughran. “So,” he asked, “knowing everything you know now, would you still do this project?”
“Sure,” Loughran said. “And you?”
“I'm a science guy, so I absolutely would,” Ferrucci said. He had been able to build his machine, after all, despite his concerns about how the
Jeopardy
match would play out. “But if I was a marketing guy,” he added, “I'm not so sure . . .”
“We've got some issues, but it's fun,” Loughran said. “We'll get through it all.”
In the following days, Ferrucci looked to buffer the science of the
Jeopardy
challenge from the intrusions of the marketing effort and from the carnival odds of a one-game showdown. He devised a two-track approach for Watson, one for the scientific record, the other for the show biz extravaganza. What he wanted, he said, was a set of sixty sparring rounds in the fall of 2010 with the top
Jeopardy
playersâTournament of Champions qualifiers. These test games would be played on boards written for humans. There would be no bias toward the machine, unconscious or not. Watson would win some of the matches and lose others. But those games would represent its record against a high level of competition. It would establish a benchmark for Q-A technology and produce a valuable set of data. Even if Watson went on to stumble on national television, its reputation among the tech and scientific communities would be assured. “Those games will be where we'll get the real statistics on how we did,” he said. “The final game is fun. But these sixty matches will be the real study.”
Through the month of April, on conference calls and in meetings, Ferrucci repeatedly voiced his concerns to the
Jeopardy
team. He wasn't concentrating on the finger anymore. He had made that concession, and a hardware team at IBM was busy creating one. They estimated that it would slow Watson's response time by eight milliseconds. But Ferrucci continued to push for the sixty matches with champions. In April,
Jeopardy
's Friedman and Schmidt came to watch a sparring match. In the meeting with them that followed, Ferrucci went on at length about unconscious writers' bias and tainted questions. “Dave really hammered on these points,” said one participant. The
Jeopardy
executives defended their processes and protocols. The conversation grew heated. A camera crew was filming the meeting for a documentary. They were asked to leave.
That was when
Jeopardy
, in Friedman's term, “stepped back.” In late April, Friedman's team sent word to IBM that they were reconsidering every aspect of the competition, including the match itself. With this news, Watson was suddenly put into the same powerless position as thousands of other
Jeopardy
wannabes: waiting for an invitation. Unlike the aspiring human players, though, Watson had no other occupation, no other purpose on earth. What's more, it had the hopes of a $96 billion corporation resting on it. And within weeks, millions of
New York Times
readers would be learning about the coming match in a Sunday magazine cover storyâunless Loughran, IBM's press officer, alerted the
Times
that the match was in trouble. He keep quiet, trusting that the two sides would resolve their disagreements.
A week later, Friedman was sitting in his office on the Sony lot in Culver City. The walls were plastered with photographs and awards from his forty-year career in game shows, his seven Emmys, and his Cable and Broadcasting Hall of Fame plaque. It had been a tense day. That morning he had had another contentious phone conversation with Ferrucci, according to IBM. And he had to iron out strategy with Rocky Schmidt and Lisa Broffman, another producer on the show, before Schmidt flew to Europe the next day. “We've been so immersed in this,” Friedman said, minutes after meeting with Schmidt, “that we're stepping back just a little bit and thinking of the various ramifications. We're analyzing every aspect now. This is a big deal.”
Ferrucci's concerns about bias left the
Jeopardy
executives feeling exposed. The IBM scientist, after all, was implying that
Jeopardy
's writers might tilt the match toward one side or the otherâor at least be perceived as doing so. Ferrucci was always careful to ascribe this possibility to unconscious bias. But for
Jeopardy
, a franchise born from the quiz show scandals of the 1950s, the hint of such biasâconscious or notâwas poisonous. And even if Ferrucci kept this concern to himself, the point he made repeatedly was that other scientists would raise the very same questions. If it was even within the realm of possibility that
Jeopardy
had an interest in the outcome and if it used its own people to write the clues, the fairness of the game and the validity of the contest were compromised.
For Friedman, who took pride in lending the
Jeopardy
platform to science, this was tough to swallow. “[IBM] could have done this with a bunch of questions that academics came up with,” he said. “But they wanted this fabulous platform. They gain the platform and lose control.” He maintained that the future of the franchise hinged on its reputation for fairness and integrity and that if the match went forward, his team would be laying down the rules. “We rigidly adhere to not only our own code of conduct, but also obviously to the FCC regulations,” he said. “We run a pretty tight ship.”
He described how the contestants are sequestered during the filming, accompanied by handlers and prohibited from mingling with anyone with access to the clues. He recalled one time that Ken Jennings, hurrying to change a tie that “strobed on camera,” ducked into a little nook where Alex Trebek checked his appearance before stepping onto the set. This was a breach. The three players had to always stick together, under surveillance, so that no one could even be suspected of receiving favorable treatment. Jennings was quickly ousted as if he'd been a North Korean commander strolling into a meeting of the Joint Chiefs at the Pentagon. Friedman laughed. “He could have been shot.” Then he play-acted. “Oh, sorry Ken, we had to wing you in your foot there, but your buzzer thumb seems to be intact. Are you OK to play the next show? You wandered into a secure area . . .”
Friedman brushed off Ferrucci's suggestion that the results of the game could have a lasting impact on the
Jeopardy
franchise, much as Kasparov's loss to Deep Blue forever changed chess. He laughed. “When all of this, as wonderful as it is, is over, we're going to continue playing our game. We're going to continue what got us here through six thousand shows.” The message to IBM: “Thanks for coming. Thanks for playing. We're back to our day jobs.”
The tentative plan had been for the IBM team to move Watson to the Culver City studios in late 2010. It would participate in a championship match, playing against Ken Jennings and the winner of an invitational tournament of past champions. But bringing the machine into
Jeopardy
's “tightly run ship,” it was now clear, raised complications, including demands to change the show's tried-and-tested procedures. It raised the risk of rancor and public accusations. And it wasn't just the scientists who might complain. The humans would be playing for a million-dollar prize, underwritten by IBM. If they suspected any tilting in the competition, they were sure to speak up as well. In a sense, Watson's intrusion into the
Jeopardy
world represented a potential breach of its own. Friedman had to weigh his options.
One of
Jeopardy
's biggest fears, Ferrucci believed, was that Watson would grow dramatically smarter and faster over the summer and lay waste to its human foes. This was early May, weeks after
Jeopardy
had begun to reconsider the match. He was sitting in the empty observation room on the
Jeopardy
set in Yorktown. At the podium on the other side of the window, Watson had been beating humans in sparring sessions about 65 percent of the time but showing few signs of frightening dominance. The
Jeopardy
crew, he said, continued to assess the matches. “Is this fun, is this entertaining, is this speaking to our audience?” A superendowed Watson, conceivably, would drain the match of all suspense. In that case, according to Ferrucci, “People would say, âOf course computers can beat humans! Why did you promote all this?'”
Ferrucci wished it were true, that with a few devilishly smart new algorithms Watson would leap forward into a class of its own. That way he might sleep better. But he didn't see it happening. “We're working our butts off,” he said. “But I don't think we're going to see a lot of difference in Watson's performance four months from now, when we have to freeze the system. But
they
don't know that,” he said. “How could they know? They're not doing the science.”
Jeopardy
's executives also worried, he said, that IBM could jack up Watson's speed simply by adding more computing power. This was logical. But it was not the case. In distributing Watson's work to more than two thousand processors, the IBM team had broken it into hundreds of smaller tasks, most of them operating in parallel. But a handful of these jobs, Ferrucci explained, required sequential analysis. Whether it was parsing a sentence or developing a confidence ranking for a potential answer, certain basic algorithms had to follow strings of commands, with each step hinging on the previous one. This took time.