As this example shows, because even minimally complex minds simultaneously track multiple aspects of the environment while at the same time controlling multiple means of acting back, their innards must be
distributed
. This means that minds are composed of at least several—and possibly very many—interacting but distinct functional parts. (Unlike a physical part, such as the wheel of a car, a functional part is a role that can be played by different physical parts. For example, many kinds of vehicles have the functional part “support,” which in a car is played by wheels, in a tank by treads, and in a sled by runners.) The functional parts, their relationships (which may be hierarchical, as when some parts are composed of several others), and the interconnections and interactions among themselves and with the outside world together determine the kind of mind that arises from all this bustle.
Because what matters about a mind is its functional organization (and not, as we learned at the end of Chapter 2, the stuff it is made of), sharpening our thinking about functional analogies can really help us understand how minds work. It would be particularly helpful to come up with some down-to-earth example of distributed organization, seeing how essential it is to the architecture of minds. We know that on the perception (input) side there are many sensors that send their signals in all at the same time. (Each light-sensitive cell in your retina is one such sensor.) We also know that on the action (output) side the situation is similar: each of your muscles consists of many bundles of individually innervated fibers, and each joint is served by many muscles. There is no reason to assume that in between perception and action—in processing, or thinking—things are any different (more about this in later chapters).
Is there a good functional analogy, along the lines of the radio show example, that covers all these bases? Yes, there is: a parliamentary democracy.
The roots of this analogy go back to the ideas of the great Scottish Enlightenment philosopher David Hume. In a book titled
A Treatise of Human Nature
, published in 1740, Hume wrote:
I cannot compare the soul more properly to any thing than to a republic or common-wealth, in which the several members are united by the reciprocal ties of government and subordination. . . . And as the same individual republic may not only change its members, but also its laws and constitutions; in like manner the same person may vary his character and disposition, as well as his impressions and ideas, without losing his identity. Whatever changes he endures, his several parts are still connected by the relation of causation.
10
Hume’s analogy, with which he clarified his revolutionary (and, as we shall see, prescient) stance on personal identity and the nature of the self, is directly relevant to the theme of this book, if only because it implies that a person’s pursuit of happiness is best viewed as a kind of mass marathon of the mind’s multiple constituents, not a solitary trek of an indivisible ego. In reality, the republic analogy applies all across mind science. To see why this must be so, we need only recall that minds are made of computation and that what Hume called “the relation of causation”—the causal organization of a system, whether political or cognitive—is what defines both computation as such and the use of computation by minds through the mechanism of representation.
Because the architecture of each of the mind’s top-level functional “parts”—perception, thinking, action, and motivation—is distributed, the republic analogy applies not just to the mind in general but also within each of those domains, at level after level. (A computer scientist would say that it applies recursively.) A parliamentary democracy founded on the principle of separation of powers governs itself by balancing legislative, executive, and judiciary activities, which makes it functionally distributed at the top level. Each of the branches is, in turn, distributed. Interestingly and importantly, this includes the executive power, which in present-day Britain, for example, is vested in the Cabinet of Her Majesty’s Government (over which the titular monarch, thankfully, has absolutely no control). Hume’s point is that even under a regime in which certain executive decisions are made by a singular legal entity (as in the still largely autocratic Britain of Hume’s own time or in the present-day United States), they are in fact made by a plural cognitive entity—the republic of soul.
Perception by Numbers
If a person’s mind (of which his or her soul is, as we shall see later, a proper part) is like a democratically governed commonwealth, then perception is the array of information sources, from mass communication media to targeted intelligence-gathering, that the cabinet members use in formulating the foreign policy. Because for the cabinet perceptual input is by definition the sole origin of information about the outside world, we should really think about it as meeting in an underground bunker. This important detail adds an interesting twist to the idea of the mind as a democracy. As it turns out, a narrower analogy works even better: the mind is really like a wartime democracy (think World War II Britain).
11
This line of reasoning shows just how important perception is for the functioning of a human mind. The availability of reliable information about the outside world does not guarantee sane “foreign policy” or effective conduct of the metaphorical war for survival on the part of a mind. (Human societies too, republics or not, are inordinately prone to suffer themselves to be governed by those whom the prophet Jeremiah described as “foolish people, and without understanding; which have eyes, and see not; which have ears, and hear not.”
12
) However, the
absence
of perceptual information definitely complicates flight from peril and effectively dooms any attempted pursuit of a mate, let alone of abstract happiness. More than that, seeing that the contents of a human mind do not get downloaded into it fully formed, we realize that the cabinet members in our analogy must have been inside their bunker
all along
. This insight suggests that perception is indispensable not only for guiding immediate “here and now” behavior but also for driving and sustaining
development
—the protracted process that transforms a bunker-bound crèche into a war cabinet.
The array of primary information sources in human perception is literally an array—of numbers that stream into the brain from an assortment of measurement devices. The sense of sight begins with hundreds of millions of measurements of electromagnetic energy, carried out by the photoreceptors that absorb the energy of light focused by the eye’s optics onto the retina; each eye ends up sending on to the brain about one million fibers that carry an already heavily processed array of visual information. Hearing originates with tens of thousands of inner-ear hair cells, which transduce the mechanical energy of sound into neural firing. The sense of touch is mechanical too, with receptors scattered throughout the skin and the mouth. Then there are two chemical senses, taste and smell, whose receptors measure the concentrations of thousands of types of molecules of interest in their vicinity. Finally, there is the sixth sense, interoception—a motley collection of internal mechanical, chemical, and thermal gauges that report the body’s vital signs to the central nervous system.
To recognize the vastness of the computational problem faced by any mind that is bent on seeing, whether it looks at the world through Romeo’s eye or a robot’s megapixel-resolution camera, consider this: it must deal with a torrent of data that delivers several times per second a new 1,000
×
1,000 table of numbers to be made sense of. That’s all there is: a constantly changing array of numbers in which many things, some potentially interesting or dangerous, are lurking—Juliet on her bedroom balcony, the flowerpot next to her on the parapet, Benvolio’s face, Tybalt’s sword.
To salvage from the data deluge some useful information, the mind’s only recourse is to try to
relate
some of those numbers to others.
13
For starters, it would be nice to be able to do something to ensure that the image that falls on the sensor is sharp. Whereas even a blurred image of Juliet would not look anything like a flowerpot (or so one imagines), the possibility of a momentarily myopic Romeo mistakenly skewering Benvolio instead of Tybalt seems quite plausible.
14
How should numbers arranged in a table be compared to one another so as to reveal whether or not the image they form is focused? It’s actually quite easy (which is why every modern camera has built-in autofocus). The idea is to scan the table while comparing the values of adjacent entries. Scanning a blurry image generally yields a series of numbers that change relatively slowly (and under an extreme blur, as in dense fog, not at all). In contrast, in a focused image the transitions between adjacent numbers are every now and then quite sharp. (For instance, if you sit across from me, this happens in each place where it appears to you that my face ends and the wall behind me begins.)
What the control mechanism (brain or camera) needs to do, then, is to keep changing the lens focus little by little while computing those local-neighborhood sharpness estimates, until their outcome is satisfactory. For this simple trick to work, the neighboring numbers must correspond to—represent—neighboring directions in the visual world, which they indeed do. Physical law ensures that the lens of an eye preserves visual neighborhood structure in the image that it projects onto the retina, and evolutionary pressure has already seen to it that the retina preserves this structure while converting the image into an array of numbers.
Being able to focus on a scene is a far cry from being able to interpret it: my digital SLR camera focuses like a fiend, but understands nothing. I love my old SLR and will not trade it for a smarter model, but for someone who values smarts over sheer versatility and obedience, the temptation is growing apace with technology. For example, younger-generation cameras these days are getting pretty good at telling apart faces from other objects, which allows them to focus automatically on people if their master cannot be bothered with focusing by hand. Still, it will be some time before cameras get to be as good as people are at recognizing people. How do we do it?
As the no. 3 conceptual zoom tool from a few pages back kicks into action, we discover that, as always in cognition, the big “how?” question spawns a whole spate of smaller ones, some still very general, others quite specific. How is it possible to tell whether or not the object that is being looked at is a face? How can one determine whether or not a given face is familiar? And if the face is familiar, how can its identity be established? A complete answer to any of these questions would consist of two parts: one that describes the computation (“Take the array of numbers that represent the scene and carry out the following computations [detailed step-by-step instructions omitted]”), and another that describes how the computation is carried out by neurons (“The axons that form the optic nerve connect to neurons in the lateral geniculate nucleus of the thalamus according to the following pattern [detailed wiring diagram and neural activity reports omitted]”).
Although not all the details that I boldly glossed over just now are known, cognitive science has a pretty good “big picture” of how face recognition works.
15
It is easiest to start with the last and most specific of the “how?” questions posed above, the one concerning face identity. By definition, a visual system can only recognize (“re-cognize” ) a face as belonging to a particular person if that person had been seen at least once before. It would seem that recognition, then, is a simple matter of memory storage: just save a “snapshot” (array of numbers) for each face you see and later compare the representation of a face that needs to be recognized to each of the stored snapshots.
The main problem with this idea is that the same face can look very different—that is, it can present a very different array of numbers—depending on how it is illuminated and oriented with respect to the viewer. This is why at a campfire event you can reliably scare a five-year-old by illuminating your face with a flashlight from below. (Warning: failure to revert the illumination direction to normal promptly enough may result in everybody’s evening being ruined.) This is also why sheep, who can be quite good at recognizing each other by face, are baffled by upside-down pictures of their acquaintances but monkeys are not. (One expects that a circus troupe of sheep trained to perform on the trapeze would be more tolerant of face inversion.)
16
Even though faces (or any other visual objects) cannot be reliably recognized through exhaustive number-by-number comparison to stored snapshots, a mathematical analysis of the recognition problem suggests a modification to the store-and-compare procedure that makes it work.
17
The idea is really very simple: make the comparison approximate. A slavishly literal comparison between two arrays of numbers is all-or-none: a difference in even just one place brands the snapshots as “different.” In contrast, if the comparison procedure is made to estimate the
degree
of difference between the snapshots, its outcome is a graded quantity that reveals how (dis)similar they are from each other, instead of merely stating that they are not the same.
To appreciate the informativeness of representations that are based on graded similarity, imagine having witnessed a crime and being asked to identify the suspect in a police mug-shot album. Under a strict match regime, you’d probably have to rule out all the candidate mugs because none of them would coincide exactly with the face you remember. Not so under graded comparison: by pointing to several mugs that resemble to various degrees the person you saw, you could narrow down the range of possibilities, thereby discharging your civic duty and making your city’s streets safer.
18
Even better, graded similarity–based representations are not only informative but also frugal in their demands for memory. Once you have stored a sparse “starter” set of faces in a relatively detailed snapshot form, each new face can be represented by just a handful of numbers that stand for its similarities to the stored snapshots.