Authors: David Shenk
The problem, put simply, was time. Even as programmers devised increasingly elegant equations to represent quasi-intelligent chess decisions, the computers still had to evaluate an overwhelming number of positions in order to play competently. Human experts could get around this problem using intuition to ferret out most of the silly moves and narrow their decision down to a few strong possibilities. But machines couldn’t do this. They had to consider all of the choices equally, and to look ahead as many moves as possible to see how each choice would play out. That amount of calculation took a lot of time. With an average of thirty-five or so options per turn, and then thirty-five different subsequent responses for each possible move, and so on, geometric progression quickly made the large number of calculations untenable for even a speedy computer. To look ahead a mere two moves each, the computer would have to evaluate 35 × 35 × 35 × 35 = 1,500,625 positions. Three full moves required analysis of 1,838,265,625 (nearly two billion) positions. Four moves: 2,251,875,390,625 (over two trillion) positions.
This was not a short-term problem for engineers. Even as computers got faster and faster, they would not come remotely close to truly managing chess’s near-infinity. In fact, it seemed safe to predict that no machine would be able to overcome this problem for many centuries, if ever. Computer scientists estimated, for example, that a future computer examining moves five times faster than Deep Blue’s supercomputing 200 million positions per second would take an estimated 3.3 billion years to consider all of the possibilities for ten moves by each player.
In Bletchley Park during the war, Turing realized that the single most essential tool for the mathematizing of chess would be a technique developed by his Princeton mentor John von Neumann called minimax. Short for “
minimizing
the expected
maximum
loss,” minimax was essentially a method for choosing the least bad move. It came out of the logical recognition that, when competing in a game where one player’s success is another player’s failure (also called a zero-sum game), Player A will always try to choose the moves that are best for him and worst for Player B. It therefore behooves Player B to identify not his absolute ideal series of moves—since no worthy opponent will allow that course—but rather the moves which will deprive Player A of his best moves. Player B, in fact, wants to move in such a way as to leave Player A with his least best option, his minimum maximum.
The minimax logic applied to any game where all the information was known by every player—chess, checkers, tic-tac-toe, and so on. In practice, it required placing all game decisions onto an enormous “game tree,” with a single trunk (the current board position), some primary branches (all of White’s next possible moves), a larger number of secondary branches (all of Black’s possible responses to each of White’s next possible moves), and on and on, eventually ending with an abundance of “leaves” (representing the deepest level being analyzed at the time).
Imagine, for example, that each one of the following tiny squares is a chessboard with an array of pieces. This artificially simple chess game tree represents three possible moves by White, each one of which is followed by three possible moves by Black, each of which is followed by two possible moves by White:
The object is to determine the best of the possible moves by White at the beginning of the tree. Using the minimax procedure, the computer first scores each of the boards on the
last
level of the tree—the leaves. Imagine that, in the following diagram, a computer has examined each of the leaves and has scored each board according the relative advantage for White. The highest numbers represent the better positions for White:
Now, according to minimax logic, the computer assigns scores to branches higher up on the tree—the moves happening earlier. First it determines the best board position from each of the leaf choices, and assigns those values to the next level up.
Then the computer considers how Black will move. Black will want the least advantaged position for White—the lowest score. So it moves those low scores up to the top level:
Now White must decide how to move. White will choose the move with the highest score—the board represented by the score of seven.
Needless to say, if chess trees were anywhere near as simple as the demonstration above, they wouldn’t be necessary in the first place. Minimax enabled computers to sort through chess trees with millions of branches on the fourth and fifth branch levels, billions on the sixth and seventh levels, and trillions on the eighth level. In tackling such logical complexity, the technique emerged as far more than a computer chess tool. It became a foundation stone of modern game theory, applicable to war gaming, economics, coalition building, and a wide variety of other systems involving two or more competitive parties. It helped jump-start artificial intelligence research, and has since enabled us to look scientifically at human endeavors such as voter participation. Minimax enabled social scientists to mathematize large chunks of public life.
In the nearer term, it also quickly put computer chess programmers in a serious bind. By opening up chess to a nearly endless series of calculations, minimax made chess computing both possible and impossible. The equations could be continually improved to make better and better chess decisions, but it simply took too long for computers to analyze all the possibilities. Then, in 1956, American computer scientist and AI pioneer John McCarthy—he had actually coined the term
artificial intelligence
one year earlier—came up with an ingenious revision of minimax called alpha-beta pruning that allowed a computer to ignore certain leaves on a tree whose evaluations wouldn’t make a difference in the final result. Like the minimax concept, the idea wasn’t based on any particular insight into chess, but was a simple matter of logic: certain leaf evaluations are irrelevant if other leaf values on that same branch have already taken that particular branch out of contention. A computer instructed not to bother calculating such nonactionable leaves could accomplish its work in much less time.
Alpha-beta pruning was, in a sense, the first true piece of artificial intelligence: an algorithm (a step-by-step procedure for solving a problem) that helped machines logically rule out certain options in a crude analogue to how humans do the same thing through intuition. An expert human player could tell with a quick glance and a split-second subconscious memory sweep which moves could be ignored and which should be prominently considered. Alpha-beta pruning was the computer’s way of setting similar priorities.
In 1966, MIT’s Richard Greenblatt introduced yet another critical innovation called transposition tables. This technique allowed a computer to cache (temporarily remember) the value of certain chess positions so that when they came up again they wouldn’t have to be fully reevaluated. It took advantage of the observation that identical chess positions were often produced by moves in different sequence. These two separate openings, for example, produce exactly the same board position, since they are the same moves in different order:
1. e4 e5 2. f3 c6
1. f3 c6 2. e4 e5
Such duplications of positions are called transpositions. A program that could recognize and remember transpositions could trim the necessary number of calculations. This was the earliest glimpse of what Turing had fantasized as machine learning, since a cached position would enable the computer to remember certain truths as a game progressed—and potentially even from game to game. With this new way for computers to remember game positions, it would henceforth no longer be possible, for example, for a human player to beat a computer exactly the same way twice in a row. (How many players wish they could say as much for themselves?) Computers could essentially remember their mistakes.
None of these improvements in searching and pattern recognition were restricted to chess, of course. But for many years, from the 1970s through the mid-1990s, chess continued to be a choice vehicle for AI research and experimentation. At engineering schools like MIT and Carnegie Mellon, graduate students married software advances with faster and faster processors in a competitive race to build the ultimate chess computer—a machine that could beat the world chess champion. Teams on campuses formed around one chess project or another, with students custom-designing, redesigning, and re-redesigning tiny silicon “chess chips.” The progress was slower than many had hoped for, but it was steady. In 1978 the leading machine of the time, known as Chess 4.7, developed by David Slate and Larry Atkin at Northwestern University, forced Scottish master player David Levy into a draw—a first. A few years later, the leading computers started winning the occasional game against expert players, and in 1988 the Carnegie Mellon machine HiTech became the first computer to be ranked as a grandmaster. When Garry Kasparov first played and defeated Deep Blue in 1996, beating the machine with three wins, one loss, and two draws, the world champion reported that he had detected stirrings of genuine thought in the machine. The following year, after some major processor and programming upgrades, Deep Blue defeated Kasparov, with two wins, one loss, and three draws. The result sent a quick chill around the world. Much soul searching began.
Was this the end of chess? The end of us? What did Deep Blue’s victory mean?
Some were quick to point out that the stunning achievement was limited to a mere board game. Deep Blue didn’t know to stop at a red light, and couldn’t string two words together or offer anything else in the way of even simulated intelligence. Others didn’t think that even the chess win was so amazing. MIT linguist Noam Chomsky scoffed that a computer beating a grandmaster at chess was about as momentous “as the fact that a bulldozer can lift more than some weight lifter.” It was simply another case in the long history of technology, he argued, of humans inventing machines that could perform highly specialized tasks with great efficiency. Specialization did not intelligence make.