MAXWELL’S DEMON
Let’s shift gears a bit to return to the thought-experiment playground of nineteenth-century kinetic theory. Ultimately this will lead us to the connection between entropy and information, which will circle back to illuminate the question of memory.
Perhaps the most famous thought experiment in all of thermodynamics is Maxwell’s Demon. James Clerk Maxwell proposed his Demon—more famous than Laplace’s, and equally menacing in its own way—in 1867, when the atomic hypothesis was just beginning to be applied to problems of thermodynamics. Boltzmann’s first work on the subject wasn’t until the 1870s, so Maxwell didn’t have recourse to the definition of entropy in the context of kinetic theory. But he did know about Clausius’s formulation of the Second Law: When two systems are in contact, heat will tend to flow from the hotter to the cooler, bringing both temperatures closer to equilibrium. And Maxwell knew enough about atoms to understand that “temperature” measures the average kinetic energy of the atoms. But with his Demon, he seemed to come up with a way to increase the difference in temperature between two systems, without injecting any energy—in apparent violation of the Second Law.
The setup is simple: the same kind of box of gas divided into two sides that we’re very familiar with by now. But instead of a small opening that randomly lets molecules pass back and forth, there’s a small opening with a very tiny door—one that can be opened and closed without exerting a noticeable amount of energy. At the door sits a Demon, who monitors all of the molecules on either side of the box. If a fast-moving molecule approaches from the right, the Demon lets it through to the left side of the box; if a slow-moving molecule approaches from the left, the Demon lets it through to the right. But if a slow-moving molecule approaches from the right, or a fast-moving one from the left, the Demon shuts the door so they stay on the side they’re on.
It’s clear what will happen: Gradually, and without any energy being exerted, the high-energy molecules will accumulate on the left, and the low-energy ones on the right. If the temperatures on both sides of the box started out equal, they will gradually diverge—the left will get hotter, and the right will get cooler. But that’s in direct violation of Clausius’s formulation of the Second Law. What’s going on?
If we started in a high-entropy state, with the gas at equal temperature throughout the box, and we evolve reliably (for any beginning state, not just some finely tuned ones) into a lower-entropy state, we’ve gone from a situation where a large number of initial states all evolve into a small number of final states. But that simply can’t happen, if the dynamical laws are information conserving and reversible. There’s no room for all of those initial states to be squeezed into the smaller number of final states. So clearly there has to be a compensating increase in entropy somewhere, if the entropy in the gas goes down. And there’s only one place that entropy could go: into the Demon.
Figure 49:
By letting high-energy molecules move from the right half of the box to the left, and slow-moving molecules move from the left to the right, Maxwell’s Demon lets heat flow from a cold system to a hotter one, in apparent violation of the Second Law.
The question is, how does that work? It doesn’t look like the Demon increased in entropy; at the start of the experiment it’s sitting there peacefully, waiting for the right molecules to come along, and at the end of the experiment it’s still sitting there, just as peacefully. The embarrassing fact is that it took a long time—more than a century—for scientists to really figure out the right way to think about this problem. Hungarian-American physicist Leó Szilárd and French physicist Léon Brillouin—both of whom were pioneers in applying the new science of quantum mechanics to problems of practical interest—helped pinpoint the crucial relationship between the information gathered by the Demon and its entropy. But it wasn’t until the contributions of two different physicist/computer scientists who worked for IBM, Rolf Landauer in 1961 and Charles Bennett in 1982, that it finally became clear why exactly the Demon’s entropy must always increase in accordance with the Second Law.
151
RECORDING AND ERASING
Many attempts to understand Maxwell’s Demon concentrated on the means by which it measured the velocities of the molecules zooming around its vicinity. One of the big conceptual leaps of Landauer and Bennett was to focus on the means by which the Demon
recorded
that information. After all, the Demon has to remember—even if just for a microsecond—which molecules to let by, and which to keep on their original sides. Indeed, if the Demon simply knew from the start which molecules had which velocities, it wouldn’t have to do any measurements at all; so the crux of the problem can’t be in the measurement process.
So we have to equip the Demon with some way to record the velocities of all the molecules—perhaps it carries around a notepad, which for convenience we can imagine has just enough room to record all of the relevant information. (Nothing changes if we consider larger or smaller pads, as long as the pad is not infinitely big.) That means that the state of the notepad must be included when we calculate the entropy of the combined gas/Demon system. In particular, the notepad must start out blank, in order to be ready to record the velocities of the molecules.
But a blank notepad is, of course, nothing other than a low-entropy past boundary condition. It’s just the Maxwell’s Demon version of the Past Hypothesis, sneaked in under another guise. If that’s the case, the entropy of the combined gas/Demon system clearly wasn’t as high as it could have been. The Demon doesn’t lower the entropy of the combined system; it simply transfers the entropy from the state of the gas to the state of the notepad.
You might be suspicious of this argument. After all, you might think, can’t the Demon just
erase
the notepad when all is said and done? And wouldn’t that return the notepad to its original state, while the gas went down in entropy?
This is the crucial insight of Landauer and Bennett: No, you can’t just erase the notepad. At least, you can’t erase information if you are part of a closed system operating under reversible dynamical laws. When phrased that way, the result is pretty believable: If you were able to erase the information entirely, how would you ever be able to reverse the evolution to its previous state? If erasure is possible, either the fundamental laws are irreversible—in which case it’s not at all surprising that the Demon can lower the entropy—or you’re not really in a closed system. The act of erasing information necessarily transfers entropy to the outside world. (In the case of real-world erasing of actual pencil markings, this entropy comes mostly in the form of heat, dust, and tiny flecks of rubber.)
So you have two choices. Either the Demon starts with a blank low-entropy notepad, in a demonic version of the Past Hypothesis, and simply transfers entropy from the gas to the notepad; or the Demon needs to erase information from the notepad, in which case entropy gets transferred to the outside world. In either case, the Second Law is safe. But along the way, we’ve opened the door to the fascinating connection between information and entropy.
INFORMATION IS PHYSICAL
Even though we’ve tossed around the word
information
a lot in discussing dynamical laws of physics—reversible laws conserve information—the concept still seems a bit abstract compared to the messy world of energy and heat and entropy. One of the lessons of Maxwell’s Demon is that this is an illusion:
Information is physical
. More concretely, possessing information allows us to extract useful work from a system in ways that would have otherwise been impossible.
Leó Szilárd showed this explicitly in a simplified model of Maxwell’s Demon. Imagine that our box of gas contained just a single molecule; the “temperature” would just be the energy of that one gas molecule. If that’s all we know, there’s no way to use that molecule to do useful work; the molecule just rattles around like a pebble in a can. But now imagine that we have a single bit of information: whether the molecule is on the left side of the box or the right. With that, plus some clever thought-experiment-level manipulation, we can use the molecule to do work. All we have to do is quickly insert a piston into the other half of the box. The molecule will bump into it, pushing the piston, and we can use the external motion to do something useful, like turn a flywheel.
152
Note the crucial role played by information in Szilárd’s setup. If we didn’t know which half of the box the molecule was in, we wouldn’t know where to insert the piston. If we inserted it randomly, half the time it would be pushed out and half the time it would be pulled in; on average, we wouldn’t be getting any useful work at all. The information in our possession allowed us to extract energy from what appeared to be a system at maximal entropy.
To be clear: In the final analysis, none of these thought experiments are letting us violate the Second Law. Rather, they provide ways that we could appear to violate the Second Law, if we didn’t properly account for the crucial role played by information. The information collected and processed by the Demon must somehow be accounted for in any consistent story of entropy.
The concrete relationship between entropy and information was developed in the 1940s by Claude Shannon, an engineer/mathematician working for Bell Labs.
153
Shannon was interested in finding efficient and reliable ways of sending signals across noisy channels. He had the idea that some messages carry more effective information than others, simply because the message is more “surprising” or unexpected. If I tell you that the Sun is going to rise in the East tomorrow morning, I’m not actually conveying much information, because you already expected that was going to happen. But if I tell you the peak temperature tomorrow is going to be exactly 25 degrees Celsius, my message contains more information, because without the message you wouldn’t have known precisely what temperature to expect.
Shannon figured out how to formalize this intuitive idea of the effective information content of a message. Imagine that we consider the set of all possible messages we could receive of a certain type. (This should remind you of the “space of states” we considered when talking about physical systems rather than messages.) For example, if we are being told the outcome of a coin flip, there are only two possible messages: “heads” or “tails.” Before we get the message, either alternative is equally likely; after we get the message, we have learned precisely one bit of information.
If, on the other hand, we are being told what the high temperature will be tomorrow, there are a large number of possible messages: say, any integer between -273 and plus infinity, representing the temperature in degrees Celsius. (Minus 273 degrees Celsius is absolute zero.) But not all of those are equally likely. If it’s summer in Los Angeles, temperatures of 27 or 28 degrees Celsius are fairly common, while temperatures of -13 or +4,324 degrees Celsius are comparatively rare. Learning that the temperature tomorrow would be one of those unlikely numbers would convey a great deal of information indeed (presumably related to some global catastrophe).
Roughly speaking, then, the information content of a message goes
up
as the probability of a given message taking that form goes
down
. But Shannon wanted to be a little bit more precise than that. In particular, he wanted it to be the case that if we receive two messages that are completely independent of each other, the total information we get is equal to the sum of the information contained in each individual message. (Recall that, when Boltzmann was inventing his entropy formula, one of the properties he wanted to reproduce was that the entropy of a combined system was the sum of the entropies of the individual systems.) After some playing around, Shannon figured out that the right thing to do was to take the
logarithm
of the probability of receiving a given message. His final result is this: The “self-information” contained in a message is equal to minus the logarithm of the probability that the message would take that particular form.
If many of these words sound familiar, it’s not an accident. Boltzmann associated the entropy with the logarithm of the number of microstates in a certain macrostate. But given the Principle of Indifference, the number of microstates in a macrostate is clearly proportional to the probability of picking one of them randomly in the entire space of states. A low-entropy state is like a surprising, i nformation-filled message, while knowing that you’re in a high-entropy state doesn’t tell you much at all. When all is said and done, if we think of the “message” as a specification of which macrostate a system is in, the relationship between entropy and information is very simple: The information is the difference between the maximum possible entropy and the actual entropy of the macrostate.
154