Read The First Word: The Search for the Origins of Language Online
Authors: Christine Kenneally
It would appear that this skill has become fully developed in our species over the last six million years, since we split from our common ancestor with chimpanzees and bonobos. From the babbling stage on we start to repeat simple vowels and consonants, like “mamamamamama,” advancing to whole words, sentences, longer tracts, all the while using rhythms, pitch, and loudness. Still, like many of our other abilities, this one is built on a platform that stretches back a long way in evolutionary time.
Vocal learning is one of the reasons that Fitch believes the field of language evolution is worth pursuing. “Where you get any kind of open-ended learning, you have the ability to pair signals with meaning. And we didn’t have to evolve that, because our common ancestor with other primates already evolved it. What we don’t have in a chimp or any other ape is vocal learning—the ability to generate new signals. Dogs, for example, aren’t able to invent new barks.”
Some other animals are also exceptional at vocal imitation, whether it involves imitating a human or a member of their own species. Songbirds are not born with genetic programs from which their songs arise. Instead, in the same way that we are born with a predisposition to produce the sounds of language, the specifics of which we still must learn, they need to be exposed to the songs of their species in order to acquire them.
4
African gray parrots, Alex’s species, as well as other types of parrots, are well known for their excellence in imitating human words. Some animals seem to entertain themselves by imitating the sounds of inanimate objects. Mockingbirds have been heard imitating sounds like car alarms and mobile phones, and elephants in Kenya have been recorded making almost perfect reproductions of the sound of trucks from a road nearby. Whales are very good at vocal learning. Each mating season, the males come together to sing, riffing on the songs of the previous season and producing something new from them. Dolphins are as talented at vocal imitation as they are at gestural imitation. As Lori Marino explained, “They seem to be able to imitate a number of different dimensions of a behavior. They can imitate the physical dimension, but also the temporal dimension. They can imitate rhythms. For instance, you can give them a series of tones, and they’ll be able to imitate the rhythm of that series of tones. So if you give them ENH-ENH, ENH-ENH-ENH, ENH-ENH, they’ll give you ENH-ENH, ENH-ENH-ENH, ENH-ENH.”
There have been odd, one-off cases of individual animals showing exceptional imitative talents. Fitch is fascinated by the story of Hoover, a harbor seal at the New England Aquarium that was raised by a Maine fisherman. Hoover surprised visitors by saying, “Hey, hey, you, get outta there!” Hoover didn’t “talk” until he reached sexual maturity, but once he started, he improved over the years. He spoke only at certain times of the year (not as much in the mating season) and would reputedly adopt a strange position in order to do so. He didn’t move his mouth. In
The Symbolic Species,
Terrence Deacon recounts stumbling across Hoover while walking near the aquarium one evening. He thought a guard was yelling at him (“Hey! Hey! Get outta there!”). Deacon reports that Hoover died unexpectedly of an infection and his body was disposed of before his brain could be examined.
“We don’t know if Hoover was a mutant or if other seals can do this,” said Fitch. “It’s not hard to train a seal to bark on command. There’s a sea lion named Guthrie at the New England Aquarium. He gets rewarded when he does something different. His barks are not very special, but they are bona fide novel vocalizations.” Fitch relates Hoover’s ability to the Celtic selkie myths, which may have originated in earlier Hoover-like accounts. “It’s not uncommon for humans to take seals into their homes,” said Fitch. “Maybe we just need to expose male seals to human speech and the right social context,” and they’ll be able to learn some speech.
What makes Hoover so interesting, according to Fitch, is that all the other animals that are excellent at vocal learning, with the possible exception of bats, use a completely different process from the ancestral vertebrate mechanism for making sound. What we use for vocal production is the same thing that a frog uses—a larynx and tongue, equipment that has been around since early vertebrates dragged themselves onto land. Birds, on the other hand, have evolved a completely novel organ—the syrinx. The toothed whales, like dolphins and killer whales, have evolved a unique organ in their nose, and we still don’t really know how other whales make sound. “It’s hard to peer down the nostril of a humpback or get them in an X-ray setup while they are singing,” observes Fitch.
Early speech researchers like Philip Lieberman proposed that one of the adaptations that humans made to produce language and speech was a descended larynx. The human larynx is a complicated assemblage of four different kinds of cartilage and the small, bent hyoid bone that sits upon them. The area above the larynx is called the upper respiratory tract. Below the larynx there are two tracts: the windpipe, which leads to the lungs, and the digestive tract, leading to the stomach. When humans swallow, the larynx essentially closes, ensuring that food or liquid doesn’t fall into our lungs. The larynx also contains the vibrating vocal cords we use in speech.
In many animals, such as other apes, the larynx sits high in the throat. In fact, for most animals the larynx is positioned so high that it’s effectively in the nasal passages, meaning that these creatures can breathe and drink at the same time. Human babies, who are born with high larynxes, can do the same, but by the time they turn three, the larynx has descended and this is no longer possible. For boys, the larynx descends a bit more in adolescence, giving their voices a more baritone timbre. Somewhere in our evolutionary history—between the present and the last common ancestor we had with chimpanzees and bonobos six million years ago—our larynx dropped, making the upper and lower respiratory tracts roughly equal in size. It is these two tubes that allow humans to make such a wide range of different vowel and consonant sounds.
For a long time researchers thought that the descended human larynx was the smoking gun of speech evolution, but the picture turns out to be more complicated than that. Most previous findings about the larynx of other animals were based on the anatomy of dead specimens, but Fitch investigated the behavior of living, vocalizing animals and discovered that the larynx is a far more mobile structure than previously thought. He found that other animals that don’t have a permanently descended larynx pull it into a lower position when they vocalize. Dogs do so, as do goats, pigs, and monkeys. In addition, Fitch discovered that some animals have a permanently descended larynx, including species as diverse as the lion and the koala. What this means, said Fitch, is that you can’t assume that the reason the larynx descended in humans was for speech; you have to be able to explain the function of the descended larynx in these other animals as well.
In his Ph. D. work Fitch demonstrated a basic correlation between body size and the deepness of voice. In the animal kingdom this correlation provides extremely useful information. If you hear a competitor wooing the female you are interested in, and you can tell from his voice alone that he is much bigger than you, slinking away without direct confrontation makes the most evolutionary sense. Fitch argues that this is how we initially came by our descended larynx, meaning that one of the fundamental elements of our ability to create speech came about not because of language but as a primitive mechanism to signal an exaggerated body size.
Other critics maintain that the descended larynx is most likely an example of evolutionary adaptation in the human lineage. Steven Pinker explained:
I think it’s premature to say that there has been no evolutionary change in speech perception and speech production mechanisms. In fact, certainly for speech production mechanisms I think the argument that there’s been no adaptation or evolutionary change is very weak. It’s based on the idea of the descent of the larynx seen in some other mammals, which did not evolve it for language, but rather for bellowing in a more macho way. So yes, it’s marginally possible that the larynx descended in humans for some reason other than language, but that theory doesn’t work for humans, because we have a descended larynx in both sexes, where exaggerating body size by bellowing more loudly is not a factor.
Fitch adds that just because the descended larynx may have come about for reasons other than speech doesn’t mean it wasn’t then co-opted—or in Darwinian terms, exapted—for speech evolution. He emphasizes the possibility of gradual evolution. “The fact remains,” he writes, “that the human larynx is unusual (though not unique) among mammals.” It’s possible, he says, that early hominids had a mobile larynx, like those of dogs and pigs. But as they began to develop the extensive sound range of speech, it became more efficient to leave the larynx in the descended position instead of pulling it back to vocalize, as other animals do.
5
The notion of a graded evolutionary descent is supported by recent findings on the larynx of chimpanzee infants, which also undergoes a process of descent. This process results from a somewhat different mechanism, accomplished by the descent of the skeleton around the chimpanzee hyoid bone rather than the descent of the hyoid bone itself. Nevertheless, it suggests that descent of the larynx in humans is unlikely to have occurred in one big, speech-related transition.
6
Other features of vocal production in humans that appear to be especially attuned for language include a particular kind of muscle fiber in the vocal folds. According to Ira Sanders at the Mount Sinai School of Medicine, slow tonic muscle fibers have unique features. They don’t twitch like most muscle fibers but contract in a precise, graded fashion. Sanders examined a series of adult tongues and found that the slow tonic muscle fibers occur there in high numbers. Other mammals do not have this kind of muscle in their vocal folds.
Attempts to find fossil evidence for the key anatomical changes required for modern human speech have been mostly unsuccessful. Fitch attributes this to the fact that “the vocal tract is a mobile structure that essentially floats in the throat, suspended from the skull by elastic ligaments and muscles.” Some researchers have compared the part of the spine that affects voluntary breathing—a crucial part of speech production—in
Homo sapiens, Homo ergaster,
and earlier hominids. It appears that this region is significantly enlarged in modern humans as compared with earlier ancestors.
7
Regardless of their other theoretical differences, most language evolution researchers agree that human speech appears to have evolved in the last six million years to meet some of our species’ unique communication needs. The most basic and obvious evidence for this is that despite concerted efforts to teach spoken language to other primates, no attempt has been successful. At most, chimpanzees have been trained to utter a few words.
8
But the perception of speech is another matter.
The human facility for perceiving speech begins very young: small babies have been shown to prefer the sounds of speech to nonspeech sounds. It is a fascinating paradox that humans can hear only up to fifteen different nonspeech sounds per second, and beyond this they hear unremitting noise. Yet when they decode speech, they hear twenty to thirty distinct sounds per second. Somehow human speakers can pack, and in turn unpack, almost twice as many sounds if those sounds consist of consonants and vowels that are the components of the language they speak.
Humans also have a remarkable ability to calibrate the way that speakers’ voices occupy many different spots within the range of possible pitch. Children’s voices are typically the highest, women’s are in the middle of the range, and men can have very deep timbre.
9
This means that even though they are all speaking the same language, the formant frequencies of any given vowel can be quite different. Nevertheless, we understand the speakers of our language to be making the same sounds.
Some researchers believe that the movements of our throats, tongues, mouths, and faces in speech are as important as the sound of speech. They hold that at some level, speech is also gesture. Indeed, our ability to perceive the speech of others is based in part on our knowledge of the motor movements we make when we produce it. It’s been demonstrated that subjects who are shown a video of someone saying “ga” that is accompanied by a recording of the sound “ba” perceive something entirely different. They will “hear” “da,” which in terms of speech production is in between the “ga” and “ba” sounds (“ba” is made with the lips, “da” is made with the tongue touching the roof of the mouth behind the teeth, and “ga” is made with the back of the tongue hitting the roof at the back of the mouth). This phenomenon is called the McGurk effect, and it demonstrates that as far as the perception of such simple sounds goes, people can be as influenced by the motor acts they see as by the sound they hear.
One of the most important strategies that human brains use to understand speech is called categorical perception. Even though we think of the sounds in our alphabet as being distinct from one another, there is a continuum between sounds like
p
and
b,
which differ only in the timing of the vocal cords’ vibrations.
Scientists who first discovered categorical perception in the 1950s found that timing is critical in the perception of sound. For example, listeners’ perception of
b-p
changes at the twenty-five-millisecond mark. If they hear the
b-p
sound and the vocal cords begin to vibrate at 10 or 20 ms, they hear a
b;
if the vocal cords begin to vibrate at 25 ms or higher, even though everything else about the sound is the same, they hear a
p
instead. It is as if a switch is thrown at the 25 ms mark. People hear only one sound or the other, not a sound that is a little like both. In the 1970s the experiment was repeated using infants as subjects, and researchers found that children make the same categorical distinction between sounds. The finding was hailed as evidence of an innate and uniquely human language trait.