The Most Human Human (22 page)

Read The Most Human Human Online

Authors: Brian Christian

BOOK: The Most Human Human
2.3Mb size Format: txt, pdf, ePub

Computability theory, Ackley says, has the mandate “Produce correct answers, quickly if possible,” whereas life in practice is much more like “Produce timely answers, correctly if possible.” This is an important difference—and began to suggest to me another cornerstone for my strategy at the Turing test.

Uh
and
Um

When trying to break a model or approximation, it’s useful to know what is captured and not captured by that model. For instance, a good first start for someone trying to prove they’re playing a saxophone, and not a synthesizer made to sound like a saxophone, would be to play
non-notes:
breaths, key clicks, squawks. Maybe a good start for someone trying to break models of language is to use
non-words:
NYU philosopher of mind Ned Block, as a judge in 2005, made a point of asking questions like “What do you think of dlwkewolweo?” Any answer other than befuddlement (e.g., one bot’s “Why do you ask?”) was a dead giveaway.

Another approach would be to use words that we use all the time, but that historically haven’t been considered words at all: for example, “um” and “uh.” In his landmark 1965 book,
Aspects of the Theory of Syntax
, Noam Chomsky argues, “Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance.” In this view words like “uh” and “um” are errors—and, say Stanford’s Herbert Clark and UC Santa Cruz’s Jean Fox Tree, “they therefore lie outside language proper.”

Clark and Fox Tree, however, disagree. Most languages have two
distinct
terms, just as English does: If they are simply errors, why would there be two, and why in every language? Furthermore, the usage pattern of “uh” and “um” shows that speakers use “uh” before a pause of less than a second, and “um” before a longer pause. This information suggests two things: (1) that the words are far from interchangeable and in fact play distinct roles, and (2) that because these words are made
before
the pauses, speakers must be anticipating
in advance
how long the following pause will be. This is much more significant than mere “error” behavior, and leads Clark and Fox Tree to the conclusion “that
uh
and
um
are, indeed, English words. By words, we mean linguistic units that have conventional phonological shapes and meanings and are governed by the rules of syntax and prosody … 
Uh
and
um
must be planned for, formulated, and produced as parts of utterances just as any other word is.”

In a purely
grammatical
view of language, the words “uh” and “um” are meaningless. Their dictionary entries would be blank. But note that the idealized form of language which Chomsky makes his object of study explicitly ignores “such grammatically irrelevant conditions as memory limitations … [and] actual performance.” In other words, Chomsky’s theory of language is the
computability
theory of Turing’s era, not the
complexity
theory that followed. Very similarly idealized, as it happens, are chatbots’ models of language. Yet it turns out—just as it did in computer science—that there’s a tremendous amount happening in the gap between the “ideal” process and the “actual performance.”

As a human confederate, I planned to make as much of this gap as possible.

Satisficing and Staircase Wit

Economics, historically, has also tended to function a bit like computability theory, where “rational agents” somehow gather and synthesize infinite amounts of information in the smallest of jiffies, then
immediately decide and act. Such theories say this and that about “costs” without really considering:
consideration
itself
is
a cost! You can’t trade stocks except in real time: the longer you spend analyzing the market, the more the market has meanwhile changed. The same is true of clothes shopping: the season is gradually changing, and so is fashion, literally while you shop. (Most bad fashion is simply good fashion
at the wrong time.
)

The Nobel laureate, Turing Award winner, and academic polymath—economics, psychology, political science, artificial intelligence—Herbert Simon coined the word “satisficing” (satisfying + sufficing) as an alternative to objective optimization/maximization.

By the lights of computability theory, I’d be as good a guitar player as any, because you give me any score and I can hunt around for the notes one by one and play them …

English composer Brian Ferneyhough writes scores so outrageously complicated and difficult that they are simply unperformable as written. This is entirely the point. Ferneyhough believes that virtuosic performers frequently end up enslaved by the scores they perform, mere extensions of the composer’s intention. But because a perfect performance of his scores is
impossible
, the performer must
satisfice
, that is, cut corners, set priorities, reduce, simplify, get the gist, let certain things go and emphasize others. The performer can’t
avoid
interpreting the score their own way, becoming personally involved; Ferneyhough’s work asks, he says, not for “virtuosity but a sort of honesty, authenticity, the exhibition of his or her own limitations.” The
New York Times
calls it “music so demanding that it sets you free”—in a way that a less demanding piece wouldn’t. Another part of what this means is that all performances are site-specific; they never become fungible or commoditized. As musicologist Tim Rutherford-Johnson puts it, Ferneyhough “draws so much more into the performance of a work than simple reproduction of a composer’s instructions; it’s hard to imagine future re-re-re-recordings of the same old lazy interpretations of Ferneyhough works, a fate that too much great music is burdened with today.”

For Bernard Reginster, authenticity resides in spontaneity. Crucially, this would seem to have a component of
timing:
you can’t be spontaneous except in a way that keeps up with the situation, and you can’t be sensitive to the situation if it’s changing while you’re busy making sense of it.

Robert Medeksza, whose program Ultra Hal won the Loebner Prize in 2007, mentioned that the conversational database he brought to the competition for Ultra Hal was smaller than ’07 runner-up Cleverbot’s by a factor of 150. The smaller database limited Ultra Hal’s
range
of responses, but improved the
speed
of those responses. In Medeksza’s view, speed proved the decisive factor. “[Cleverbot’s larger database] actually seemed to be a disadvantage,” he told an interviewer after the event. “It sometimes took [Cleverbot] a bit long to answer a judge as the computer [couldn’t] handle that amount of data smoothly.”

I think of the great French idiom
l’esprit de l’escalier
, “staircase wit,” the devastating verbal comeback that occurs to you as you’re walking down the stairs out of the party. Finding the mot juste a minute too late is almost like not finding it at all. You can’t go “in search of” the mot juste or the bon mot. They ripen and rot in an instant. That’s the beauty of wit.

It’s also the beauty of life. Computability theory is staircase wit. Complexity theory—satisficing, the timely answer, as correct as possible—is dialogue.

“Barge-In-Able Conversation Systems”

The 2009 Loebner Prize competition in Brighton was only a small part of a much larger event happening in the Brighton Centre that week, the annual Interspeech conference for both academic and industry speech technology researchers, and so ducking out of the Loebner Prize hall during a break, I immediately found myself in the swell and crush of some several thousand engineers and programmers and theorists from all over the globe, rushing to and from
various poster exhibitions and talks—everything from creepy rubber mock-ups of the human vocal tract, emitting zombie versions of human vowel sounds, to cutting-edge work in natural language AI, to practical implementation details concerning how a company might make its automated phone menu system suck less.

One thing you notice quite quickly at events like this is how thick a patois grows around every field and discipline. It’s not easily penetrated in a few days’ mingling and note taking, even when the underlying subject matter makes sense. Fortunately, I had a guide and interpreter, in the form of my fellow confederate Olga. We wandered through the poster exhibition hall, where the subtlest of things about natural human conversation were named, scrutinized, and hypothesized about. I saw a poster that intrigued me, about the difficulty of programming “Barge-In-Able Conversational Dialogue Systems”—which humans, the researcher patiently explained to me, are. “Barge-in” refers to the act of leaping in to talk while the other person is still talking. Apparently most spoken dialogue systems, like most chatbots, have a hard time dealing with this.

Notation and Experience

Just as Ferneyhough is interested in the differences “between the notated score and the listening experience,” so was I in the differences between idealized theories of language and the ground truth of language in practice, the differences between
logs
of conversations and conversation itself.

One of my friends, a playwright, once told me, “You can always identify the work of amateurs, because their characters speak in complete sentences. No one speaks that way in real life.” It’s true: not until you’ve had the experience of transcribing a conversation is it clear how true this is.

But sentence fragments themselves are only the tip of the iceberg. A big part of the reason we speak in fragments has to do with the
turn-taking
structure of conversation. Morse code operators transmit
“stop” to yield the floor; on walkie-talkies it’s “over.” In the Turing test, it’s traditionally been the carriage return, or enter key. Most scripts read this way: an inaccurate representation of turn-taking is, in fact, one of the most pervasive ways in which dialogue in art fails to mirror dialogue in life. But what happens when you remove those markers? You make room both for silences and for interrupts, as in the following, an excerpt of the famously choppy dialogue in David Mamet’s Pulitzer-winning
Glengarry Glen Ross:

LEVENE
:
You want to throw that away, John …? You want to throw that away?

WILLIAMSON
:
It isn’t me …

LEVENE
: … 
it isn’t you …? Who
is
it? Who is this I’m talking to? I need the
leads
 …

WILLIAMSON
: … 
after the thirtieth …

LEVENE
:
Bull
shit
the thirtieth, I don’t get on the board the thirtieth, they’re going to can my ass.

In spontaneous dialogue it’s natural and frequent for the participants to overlap each other slightly; unfortunately, this element of dialogue is extremely difficult to
transcribe
. In fiction, playwriting, and screenwriting, the em dash or ellipsis can signify that a line of dialogue got sharply cut off, but in real life these severances are rarely so abrupt or clean. For this reason I think even Mamet’s dialogue only gets turn-taking half right. We see characters stepping on each other’s toes and cutting in, but as soon as they do, the other character stops on a dime. We don’t see the fluidity and
negotiation
often present in those moments. The cuts are too sharp.

We squabble or tussle over the floor, fade in and out, offer “yeah’s” and “mm-hmm’s” to show we’re engaged,
2
add parentheticals to each other’s sentences without trying to stop those sentences’ flow, try to talk over an interruption only to yield a second later, and on and on,
a huge spectrum of variations. There are other notations that some playwrights and screenwriters use, involving slashes to indicate where the next line starts, but these are cumbersome to write, and to read, and even they fail to capture the range of variation present in life.

I recall going to see a jazz band when I was in college—it was on the large side, for a jazz band, with a horn section close to a dozen strong. The players were clearly proficient, and played tightly together, but their soloing—it was odd—was just a kind of rigid turn-taking, not unlike the way people queue in front of a microphone to ask a question to a lecturer at the end of a lecture: the soloist on deck waited patiently and expectantly for the current soloist’s allotted number of bars to expire, and would then play for the same number of bars him- or herself.

There’s no doubt that playing this way avoids chaos, but there is also no doubt that it limits the music.

It may be that enforced turn-taking is at the heart of how a language barrier affects intimacy, more so than the language gap itself. As NBC anchor and veteran interviewer John Chancellor explains in
Interviewing America’s Top Interviewers:

Simultaneous translation is good because you can follow the facial expressions of the person who’s talking to you, whereas you can’t in consecutive translation. Most reporters get consecutive translation, however, when they’re interviewing in a foreign language, because they can’t really afford to have simultaneous translation. But it’s very difficult to get to the root of things without the simultaneous translation.

So much of live conversation differs from, say, emailing, not because the turns are shorter, but because there sometimes are not definable “turns” at all. So much of conversation is about the extremely delicate skill of knowing when to interrupt someone else’s turn and when to “pass” on your own turn, when to yield to an interruption and when to persist.

I’m not entirely sure we humans have this skill down. If you’re like me, it’s impossible to watch much of the broadcast news in America: the screen split into four panels, where four different talking heads are shouting over each other from one commercial break to the next. Perhaps part of the reason computer software appears to know how to converse is that
we
sometimes appear not to.

Other books

Hillerman, Tony - [Leaphorn & Chee 14] by Hunting Badger (v1) [html]
Show No Mercy by Walkers, Bethany
Horrid Henry Rocks by Francesca Simon
Missing Abby by Lee Weatherly
Frat Boy and Toppy by Anne Tenino
The Theta Patient by Chris Dietzel
Sherry's Wolf by Barone, Maddy
My Seven Enticing Sins by Colani, Leah