Of Minds and Language (47 page)

Read Of Minds and Language Online

Authors: Pello Juan; Salaburu Massimo; Uriagereka Piattelli-Palmarini

BOOK: Of Minds and Language
6.92Mb size Format: txt, pdf, ePub
17.2 From rule creation to triggering

The details of the
Chapter 1
blueprint for an acquisition model didn't last very long, because they were based on a notion of grammars as sets of rules and of acquisition as composing rules, and that never worked. There weren't enough constraints on the possible grammars, and there was no plausible EM for fitting grammars to the input. The next step, also from Noam (Chomsky 1981), was to shift from rule-based grammars to grammars composed of principles and parameters, which is what you have been hearing about at this conference. Languages differ in their lexicons of course, but otherwise it is claimed that they differ only in a small, finite number of parameters. (I will limit discussion to syntax here, disregarding parameters for phonology and morphology.) An example is the Null Subject parameter, which in languages like Spanish has the value [ + null subject] because Spanish permits phonologically null subjects,
whereas in languages like English the setting is [— null subject] because subjects (of finite clauses) cannot be dropped. This is one binary syntactic parameter that a child must set.

The parametric model has properties that lighten the task of modeling language acquisition. Because it admits only a finite number of possible languages, the learning problem becomes formally trivial (Chomsky 1981:
Chapter 1
). From a psychological perspective, input sentences can be seen not as a database for hypothesis creation and testing, but as
triggers
for setting parameters in a more or less “mechanical” fashion. As Noam discussed earlier in this conference (see page 23 above), syntax acquisition then becomes simply a matter of tripping switches, a persuasive metaphor that he credits to Jim Higginbotham. A sentence comes into the child's ears; inside the child's head there is a bank of syntax switches; the sentence pushes the relevant switches over into the right
on
or
off
positions. Note that it is assumed that the triggers know which parameters to target. This will be important for the discussion that follows: the trigger sentences “tell” the learner which parameters to reset.
1
Finally, the principles and parameters model is a memoryless system, so it is economical of resources and it is plausible that a child could be capable of it. The child has to know only what the current parameter settings are, and what the current sentence is; she doesn't have to remember every sentence she's ever heard and construct a grammar that generates them all.

So the parameter model was gratefully received, a cause for celebration. But then the bad news began to come in. Robin Clark (1989) published some very important work in which he pointed out that many triggers in natural language are ambiguous between different parameter settings. One example of this is a sentence that has a non-finite complement clause with an overt subject, such as “Pat expects Sue to win.” The noun phrase
Sue
has to have case, and it gets case either from the verb above it
(expect)
or the verb below it
(win)
. The former is correct for English
(expect
assigns case across the subordinate clause boundary), but the latter is correct for Irish, where the infinitive verb can assign case to its subject. Thus, there is a parameter that has to be set, but this sentence won't set it. The sentence is ambiguous between the two values of the parameter. There are many other such instances of parametric ambiguity in natural language.

Parameter theory had started with the over-optimistic picture that for every parameter there would be at least one unambiguous trigger, it would be innately specified, and learners would effortlessly recognize it; when that trigger was
heard, it would set the parameter once and for all, correctly. What Clark's work made clear was that in many cases there would be no such unambiguous trigger; or if there were, a learner might not be able to recognize it because it would interact with everything else in the grammar and would be difficult to disentangle. This put paid to the notion that learners were just equipped with an innate list specifying that such-and-so sentences are triggers for setting this parameter, and thus-and-such sentences are triggers for this other parameter. Gibson and Wexler's (1994) analysis of parameter setting underscored the conclusion that triggers typically cannot be defined either universally or unambiguously.

You should bear in mind
always
that the null subject parameter is not the typical case. It is too easy. With the null subject parameter, you either hear a sentence with no subject and conclude that the setting is [ + null subject], or you never do, so you stay with the default setting [—null subject]. There are important details here that have been much studied,
2
but even so, setting this parameter is too easy because its effects are clearly visible (audible!) in surface sentences. For other parameters, such as those that determine word order, there are more opportunities for complex interactions. One parameter controls movement of a phrase to a certain position; other parameters control movement of other phrases to other positions. The child perceives the end product of derivations in which multiple movements have occurred, some counteracting the effects of others, some moving parts of phrases that were moved as a whole by others, and so on. This interaction problem exacerbates the ambiguity problem. It means that even for parameters that have unambiguous triggers, they might be unrecognizable because the relation between surface sentences and the parameter values that license them is not transparent.

To sum up: observations by Clark and others, concerning the ambiguity and surface-opacity of parametric triggers, called for a revision of the spare and elegant switch-setting metaphor. On hearing a sentence, it is often not possible, in reality, for a learner to identify a unique grammar that licenses it. At best, there is a pool of possible candidates. So either the “mechanical” switch-setting device contains overrides, such that one candidate automatically takes precedence over the others; or else the switches aren't set until after the alternatives have been compared and a choice has been made between them. In either case, this amounts to an evaluation metric within a parameter-setting model. A second important consequence is that triggering cannot be error-free. When there is ambiguity in the input, the learner cannot be expected always to guess the right answer. Thus, the original concept of triggering, though it was an
extremely welcome advance in modeling grammar acquisition, proved to be too clean and neat to fit the facts of human language, and it did not free us from having to investigate how the learning mechanism evaluates competing grammar hypotheses. A problem that will loom large below is that evaluation apparently needs access to all the competitors, in order to compare them with respect to whatever the evaluative criteria are (e.g., simplicity; conservatism versus novelty; etc.), but it is unclear how a triggering process could provide the comparison class of grammars.

17.3 From triggering to decoding

All of this explains why, if you check the recent literature for models of parameter setting, you will find almost nothing that corresponds to the original Chomsky–Higginbotham conception of triggering. There are still parameters to be set in current models, but neither the mechanism nor the output of triggering has been retained. Instead of an “automatic” deterministic switching mechanism, which has never been computationally implemented, it is assumed that the learner first chooses a grammar and then tests it to see whether it can license (parse) the current input sentence; if not, the learner moves on to a different grammar. This is a very weak system, and limits the ways in which the learner can select its next grammar hypotheses. A triggering learner, when it meets a sentence not licensed by the current grammar, shifts to a grammar that is similar to the current one except that it licenses the new sentence. That seems ideal, but current models do otherwise. For Gibson and Wexler's (1994) system the principle is:

(1) If the current grammar fails on an input sentence, try out a grammar that differs from it by any one parameter, and shift to it only if it succeeds.

For Yang's (2002) model it is:

(2) If the current grammar fails on an input sentence, try out a grammar selected with probability based on how well each of its component parameter values has performed in the past.

Notice that in neither case does the input sentence guide the choice of the next grammar hypothesis. These are trial-and-error systems, quite unlike triggering not only in their mechanics but also with respect to the grammar hypotheses they predict the learner will consider.

By contrast, at CUNY we have tried to retain as much of the triggering concept as is possible. Although the “automatic” aspect has to be toned
down, we can preserve another central aspect, which is that the input sentence should tell the learner which parameters could be reset to license it. In a sentence
like What songs can Pat sing?
, the
wh
-phrase
what songs
is at the front. How did it get there? In English, it got there by Wh-Movement, but other languages (Japanese, for example) can scramble phrases to the front, including
wh
-phrases. So as a trigger, this sentence is ambiguous between different parameter settings. Nothing can tell the learner which alternative is correct, but ideally the learner would at least know what the options are. We call this
parametric decoding
. The learning mechanism observes the input sentence and determines which combinations of parameter values
could
license it. Then it can choose from among these candidates, and not waste time and effort trying out other grammars that couldn't be right because they're incompatible with this sentence. Parametric decoding thus plays the extremely important role of guiding the learner towards profitable hypotheses. The only problem is that nobody knows how decoding can be done within the computational resources typical of an adult human, let alone a 2-year-old.

Our own learning model, called the Structural Triggers Learner, can do
partial
decoding. It uses the sentence-parsing routines for this. We suppose that a child tries to parse the sentences he hears, in order to understand them. For a sentence (a word string) that the current grammar does not license, the parsing attempt will break down at some point in the word string. At that point the parsing routines search for ways to patch up the break in the parse tree, and in doing so they can draw on any of the other parameter values which UG makes available but which weren't in the grammar that just failed. The parser/learner uses whichever one or more of these other parameter values are needed to patch the parse. It then adopts those values, so that its current grammar is now compatible with the input. For any given input sentence, this decoding process delivers
one
grammar that can license it. But it does not establish
all
the grammars that could license an ambiguous sentence, because to do so would require a full parallel parse of the sentence to find all of its possible parse trees. That is almost certainly beyond the capacities of the human parsing mechanism. The bulk of the evidence from studies of adult parsing is that the parser is capable only of
serial
processing, in which one parse tree is computed per sentence and any other analyses the sentence may have are ignored.
3

The limitation to serial parsing entails that the learner's parametric decoding of input sentences is not exhaustive. Partial decoding is the most that a child can
be expected to achieve. But partial decoding is not good enough for reliable application of EM, because among the analyses that were ignored by the parser might be the very one that the EM wants the learner to choose. In some other respects, partial decoding is clearly better than none. Our simulation experiments on the CoLAG language domain confirm that decoding learners arrive at the target grammar an order of magnitude faster than trial-and-error models. But for our present concern, which is how learners evaluate competing grammar hypotheses, partial decoding falls short. It is unclear how EM could be accurately applied by a learning device that doesn't know what the set of candidate grammars is. So in a nutshell, the verdict on parametric decoding is that only full decoding is useful to EM but only partial decoding is possible due to capacity limits on language processing. Explaining how learners evaluate grammars is thus a challenge for acquisition theory.

17.4 The Subset Principle as test case

In what follows I will use the Subset Principle (SP) as a test case for evaluation in general. SP is a well-defined and relatively uncontroversial component of the EM. It has long been a pillar of learnability theory and needs little introduction here. It is necessitated by the poverty of the stimulus – yet another major concept that Noam has given us. At CUNY we split the poverty of the stimulus (POS) into POPS and PONS (Fodor and Crowther 2002). POPS is
poverty of the positive stimulus
, meaning that learners don't receive examples of all the language phenomena they have to acquire, so they have to project many (most) sentences of the language without being exposed to them. A dramatic example is parasitic gaps, discussed by Noam in
Concepts and Consequences
(Chomsky 1982) and
Barriers
(Chomsky 1986a). More pertinent for today is the
poverty of the negative stimulus
(PONS), which is extreme. Children typically receive little information about what is not a well-formed sentence of the language, certainly not enough to rule out every incorrect hypothesis that they might be tempted by (Marcus 1993). Because of this, learning must be conservative, and SP is the guardian of conservative learning. Informally, the idea is that if a learner has to guess between a more inclusive language and a less inclusive language, she should prefer the latter, because if necessary she can be driven by further input sentences to enlarge a too-small language, but without negative evidence she could never discover that the language she has hypothesized contains too many sentences and needs to be shrunk. More precisely, SP says:

Other books

Demons and Lovers by Cheyenne McCray
Deadly Dozen: 12 Mysteries/Thrillers by Diane Capri, J Carson Black, Carol Davis Luce, M A Comley, Cheryl Bradshaw, Aaron Patterson, Vincent Zandri, Joshua Graham, J F Penn, Michele Scott, Allan Leverone, Linda S Prather
Fly With Fire by Frances Randon
Never Again Good-Bye by Terri Blackstock
New Year's Eve Murder by Leslie Meier
Christie by Veronica Sattler
Immortal's Eden by Lori Perry
Every Time a Rainbow Dies by Rita Williams-Garcia