© Cartoonstock.com
|
As someone who works on child
language disorders, I have tried many times to read Chomsky in order to appreciate
the insights that he is so often credited with. I regret to say that, over the years, I have come
to the conclusion that, far from enhancing our understanding of language
acquisition, his ideas have led to stagnation, as linguists have gone through
increasingly uncomfortable contortions to relate facts about children’s
language to his theories. The problem is
that the theories are derived from a consideration of adult language, and take
no account of the process of development. There is a fundamental problem with an
essential premise about what is
learned that has led to years of confusion and sterile theorising.
Let us start with Chomsky’s famous
sentence "Colourless green ideas sleep furiously". This was used to demonstrate independence of syntax and
semantics: we can judge that this sentence is syntactically well-formed even
though it makes no sense. From this, it was a small step to conclude that
language acquisition involves deriving abstract syntactic rules that determine
well-formedness, without any reliance on meaning. The mistake here was to
assume that an educated adult's ability to judge syntactic well-formedness in isolation has anything to do with how that
ability was acquired in childhood. Already in the 1980s, those who actually
studied language development found that children used a wide variety of cues, including syntactic,
semantic, and prosodic information, to learn language structure (Bates &
MacWhinney, 1989). Indeed,
Dabrowska (2010) subsequently showed that agreement on well-formedness of complex sentences was far from universal in adults.
Because he
assumed that children were learning abstract syntactic rules from the outset,
Chomsky encountered a serious problem. Language, defined this way, was not learnable by any usual
learning system: this could be shown by formal proof from mathematical learning theory. The
logical problem is that such learning is too unconstrained: any grammatical
string of elements is compatible with a wide range of underlying rule
systems. The learning becomes a bit
easier if children are given negative evidence (i.e., the learner is explicitly
told which rules are not correct), but (a) this doesn’t really happen and (b) even
if it did, arrival at the correct solution is not feasible without some prior
knowledge of the kinds of rules that are allowable. In an oft-quoted sentence, Chomsky (1965)
wrote: "A consideration of the
character of the grammar that is acquired, the degenerate quality and
narrowly limited extent of the available data, the striking uniformity of the
resulting grammars, and their independence of intelligence, motivation and
emotion state, over wide ranges of variation, leave little hope that much of
the structure of the language can be learned by an organism initially
uninformed as to its general character." (p. 58) (my italics).
So we were led to the inevitable, if surprising, conclusion that if grammatical
structure cannot be learned, it must be innate. But different languages have
different grammars. So whatever is innate has to be highly abstract – a
Universal Grammar. And the problem is
then to explain how children get from this abstract knowledge to the specific
language they are learning. The field became encumbered by creative but highly
implausible theories, most notably the parameter-setting account, which
conceptualised language acquisition as a process of "setting a switch" for a number of innately-determined
parameters (Hyams, 1986). Evidence, though, that children’s grammars actually changed
in discrete steps, as each parameter became set, was lacking. Reality was much
messier.
Viewed from a contemporary perspective, Chomsky’s concerns about the unlearnability of language
seem at best rather dated and at worst misguided. There are two key features
in current developmental psycholinguistics that were lacking from Chomsky’s account,
both concerning the question of what is learned. First, there is the question of
the units of acquisition: for Chomsky, grammar
is based on abstract linguistic units such as nouns and verbs, and it was
assumed that children operated with these categories. Over the past 15 years, direct
evidence has emerged to indicate that children don't start out with awareness
of underlying grammatical structure; early learning is word-based, and patterning in the input at the level of abstract elements is something children
become aware of as their knowledge increases (Tomasello, 2000).
Second, Chomsky viewed grammar as a rule-based system that determined allowable
sequences of elements. But people’s linguistic knowledge is probabilistic, not
deterministic. And there is now a large
body of research showing how such probabilistic knowledge can be learned from sequential
inputs, by a process of statistical learning. To take a very simple example, if
repeatedly presented with a sequence such as ABCABADDCABDAB, a learner will
start to be aware of dependencies in the input, i.e. B usually follows A, even
if there are some counter-examples. Other
types of sequence such as AcB can be learned, where c is an element that can
vary (see Hsu & Bishop, 2010, for a brief account). Regularly encountered sequences will then form higher-level units. At the time Chomsky was first writing,
learning theories were more concerned with forming of simple associations,
either between paired stimuli, or between instrumental acts and outcomes. These
theories were not able to account for learning of the complex structure of
natural language. However, once language researchers started to think in terms
of statistical learning, this led to a reconceptualisation of what was learned,
and many of the conceptual challenges noted by Chomsky simply fell away.
Current statistical learning accounts allow us to move ahead and to
study the process of language
learning. Instead of assuming that
children start with knowledge of linguistic categories, categories are
abstracted from statistical regularities in the input (see Special Issue 03, Journal of Child Language 2010, vol 37).
The units of analysis thus change as the child develops expertise. And, consistent with the earlier writings of
Bates and MacWhinney (1989), children's language is facilitated by the presence
of correlated cues in the input, e.g., prosodic and phonological cues in
combination with semantic context. In
sharp contrast to the idea that syntax is learned by a separate modular system
divorced from other information, recent research emphasises that the young
language learner uses different sources of information together. Modularity
emerges as development proceeds.
A statistical learning account does not, however, entail treating the
child as a “blank slate”. Developmental psychology has for many years focused
on constraints on learning: biases that lead the child to attend to particular features
of the environment, or to process these in a particular way. Such constraints will affect how language input is processed, but they are a long way from the notion of a Universal Grammar. And such constraints are
not specific to language: they influence, for instance, our ability to perceive
human faces, or to group objects perceptually.
It would be rash to assume that all the problems of language acquisition can be solved by adopting a statistical learning approach. And there are still big questions, identified by Chomsky and others – Why don’t other species have syntax? How did language evolve? Is linguistic ability distinct from general intelligence? But we now have a theoretical perspective that makes sense in terms of what we know about cognitive development and neuropsychology, that has general applicability to many different aspects of language acquisition, which forges links between language acquisition and other types of learning, and leads to testable predictions. The beauty of this approach is that it is amenable both to experimental test and to simulations of learning, so we can identify the kinds of cues children rely on, and the categories that they learn to operate with.
So how does Chomsky respond to this body of work? To find out, I decided to take a look at The Science of Language, which based on transcripts of conversations between Chomsky and James McGilvray between 2004 and 2009. It was encouraging to see from the preface that the book is intended for a general audience and “Professor Chomsky’s contributions to the interview can be understood by all”.
It would be rash to assume that all the problems of language acquisition can be solved by adopting a statistical learning approach. And there are still big questions, identified by Chomsky and others – Why don’t other species have syntax? How did language evolve? Is linguistic ability distinct from general intelligence? But we now have a theoretical perspective that makes sense in terms of what we know about cognitive development and neuropsychology, that has general applicability to many different aspects of language acquisition, which forges links between language acquisition and other types of learning, and leads to testable predictions. The beauty of this approach is that it is amenable both to experimental test and to simulations of learning, so we can identify the kinds of cues children rely on, and the categories that they learn to operate with.
So how does Chomsky respond to this body of work? To find out, I decided to take a look at The Science of Language, which based on transcripts of conversations between Chomsky and James McGilvray between 2004 and 2009. It was encouraging to see from the preface that the book is intended for a general audience and “Professor Chomsky’s contributions to the interview can be understood by all”.
Well, as “one of the most influential
thinkers of our time”, Chomsky fell far short of expectation. Statistical learning and connectionism were not given serious consideration, but were rapidly dismissed as versions
of behaviourism that can’t possibly explain language acquisition. As noted by Pullum elsewhere,
Chomsky derides Bayesian learning approaches as useless – and at one point
claimed that statistical analysis of sequences of elements to find morpheme
boundaries “just can’t work” (cf. Romberg & Saffran, 2010). He seemed stuck with his critique of Skinnerian learning and ignorant of how things had changed.
I became interested in not just what Chomsky said, but how he said it. I’m afraid that despite the
reassurances in the foreword, I had enormous difficulty getting through this
book. When I read a difficult text, I usually take notes to summarise the main
points. When I tried that with the Science of Language, I got nowhere because
there seemed no coherent structure. Occasionally an interesting gobbet of information
bobbed up from the sea of verbiage, but it did not seem part of a consecutive argument. The style is so discursive that it’s
impossible to précis. His rhetorical approach seemed the antithesis of a scientific argument. He made sweeping
statements and relied heavily on anecdote.
A stylistic device commonly used by Chomsky is to set up a dichotomy between his position and an alternative, then represent the alternative in a way that makes it preposterous. For instance, his rationalist perspective on language acquisition, which presupposes innate grammar, is contrasted with an empiricist position in which “Language tends to be seen as a human invention, an institution to which the young are inducted by subjecting them to training procedures”. Since we all know that children learn language without explicit instruction, this parody of the empiricist position has to be wrong.
A stylistic device commonly used by Chomsky is to set up a dichotomy between his position and an alternative, then represent the alternative in a way that makes it preposterous. For instance, his rationalist perspective on language acquisition, which presupposes innate grammar, is contrasted with an empiricist position in which “Language tends to be seen as a human invention, an institution to which the young are inducted by subjecting them to training procedures”. Since we all know that children learn language without explicit instruction, this parody of the empiricist position has to be wrong.
Overall, this book was a disappointment: one came away with a sense that a lot of
clever stuff had been talked about, and much had been confidently asserted, but
there was no engagement with any opposing point of view – just disparagement. And as Geoffrey Pullum concluded, in a review
in the Times
Higher Education, there was, alas, no science to be seen.
References
Bates, E., & MacWhinney, B.
(1989). Functionalism and the competition model. In B. MacWhinney & E.
Bates (Eds.), The crosslinguistic study
of sentence processing (pp. 3-73). Cambridge: Cambridge University Press.
Available from: http://psyling.psy.cmu.edu/papers/bib.html
Chomsky, N. (1965). Aspects of the theory of syntax.
Cambridge, MA: MIT Press.
Chomsky, N., & McGilvray, J. (2012). The Science of Language: Interviews with James McGilvray. Cambridge: Cambridge University Press.
Dabrowska, E. (2010). Native v expert intuitions: An empirical study of acceptability judgements. The Linguistic Review, 27, 1-23.
Hsu, H. J., & Bishop, D. V. M. (2010). Grammatical difficulties in children with specific language impairment (SLI): is learning deficient? Human Development, 53, 264-277.
Hyams, N. (1986). Language acquisition and the theory of
parameters. Dordrecht: Reidel.
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition Wiley Interdisciplinary Reviews: Cognitive Science, 1 (6), 906-914 DOI: 10.1002/wcs.78
Tomasello, M. (2000). Acquiring
syntax is not what you think. In D. V. M. Bishop & L. B. Leonard (Eds.), Speech and Language Impairments in Children:
Causes, Characteristics, Intervention and Outcome (pp. 1-15). Hove, UK:
Psychology Press.
Correction: 4/9/2010. I had originally cited the wrong reference to Dabrowska (Dabrowska, E. 1997. The LAD goes to school : a cautionary tale for nativists. Linguistics, 35, 735-766). The 1997 paper is concerned with variation in adults' ability to interpret syntactically complex sentences. The 2010 paper cited above focuses on grammaticality judgements.
This article (Figshare version) can be cited as:
Bishop, Dorothy V M (2014): What Chomksy doesn't get about child language. figshare.
http://dx.doi.org/10.6084/m9.figshare.1030403
A far-too-long response to (some) commentators
12th October
2012
One of the nice things about blogging is that it gives an
opportunity to get feedback on one’s point of view. I’d like to thank all those
who offered comments on what I’ve written here, particularly those who have
suggested readings to support the arguments they make. The sheer diversity of views has been
impressive, as is the generally polite and scholarly tone of the arguments.
I’ve tried to look seriously at the points people have made and I’ve had a
fascinating few weeks reading some of the broader literature recommended by
commentators.
I quickly realised that I could easily spend several months
responding to comments and reading around this area, so I have had to be
selective. I’ll steer clear of commenting on Chomsky’s
political arguments, which I see as quite a separate issue. Nor am I prepared
to engage with those who suggest Chomsky is above criticism, either because he
is so famous, or because he’s been around a long time. Finally, I won’t say more about the views of
those who have expressed agreement, or extensions of my arguments – other than
to say thanks: this is a weird subject area where all too often people seem
scared to speak out for fear of seeming foolish or ignorant. As Anon (4 Sept)
says, it can quickly get vitriolic, which is bad for everyone. But if we at least boldly say what we think,
those with different views can either correct us, or develop better
arguments.
I’ll focus in this reply on the main issues that emerged from the discussion: how far is statistical learning compatible with a
Chomskyan account, are there things that a non-Chomskyan account simply can’t
deal with, and finally, are there points of agreement that could lead to more
positive engagement in future between different disciplines?
How compatible is statistical learning with a Chomskyan
account?
A central point made by Anon, (3rd Sept/4th Sept), and Chloe
Marshall (11th Sept) is that probabilistic learning is compatible with Chomsky's views.
This seems to be an absolutely crucial point. If there
really is no mismatch between what Chomsky is saying and those who are
advocating accounts of language acquisition in terms of statistical learning,
then maybe the disagreement is just about terminology and we should try harder
to integrate the different approaches.
It’s clear we can differentiate between different levels of language processing. For instance, here are just three examples of
how statistical learning may be implicated in language learning:
- The original work by Saffran et al (1996) focused on demonstrating that infants were sensitive to transitional probabilities in syllable strings. It was suggested that this could be a mechanism that was involved in segmenting words from speech input.
- Redington et al (1998) proposed that information about lexical categories could be extracted from language input by considering sequential co-occurrences of words.
- Edelman and Waterfall (2007) reviewed evidence that children attend to specific patterns of specific lexical items in their linguistic input, concluding that they first acquire the syntactic patterns of particular words and structures and later generalize information to entire word classes. They went on to describe heuristic methods for uncovering structure in input, using the example of the ADIOS (Automatic DIstillation Of Structure) algorithm. This uses distributional regularities in raw, unannotated corpus data to identify significant co-occurrences, which are used as the basis for distributional classes. Ultimately, ADIOS discovers recursive rule-like patterns that support generalization.
So what does Chomsky make of all of this? I am grateful to
Chloe for pointing me to his 2005 paper “Three factors in language design”, which was
particularly helpful in tracing the changes in Chomsky’s views over time.
Here’s what he says on word boundaries:
“In Logical Structure of Linguistic Theory (LSLT; p. 165), I adopted Zellig Harris’s (1955) proposal, in a different framework, for identifying morphemes in terms of transitional probabilities, though morphemes do not have the required beads-on-a-string property. The basic problem, as noted in LSLT, is to show that such statistical methods of chunking can work with a realistic corpus. That hope turns out to be illusory, as has recently been shown by Thomas Gambell and Charles Yang (2003), who go on to point out that the methods do, however, give reasonable results if applied to material that is preanalyzed in terms of the apparently language-specific principle that each word has a single primary stress. If so, then the early steps of compiling linguistic experience might be accounted for in terms of general principles of data analysis applied to representations preanalyzed in terms of principles specific to the language faculty....”
Gambell and Yang don’t seem to have published in the peer-reviewed
literature, but I was able to track down four papers by these authors (Gambell & Yang, 2003; Gambell & Yang, 2004; Gambell & Yang, 2005a; Gambell & Yang, 2005b),which all make
essentially the same point. They note that a simple rule that treats a
low-probability syllabic transition as a word boundary doesn’t work with a
naturalistic corpus where a high proportion of words are monosyllabic. However, adding prosodic information –
essentially treating each primary stress as belonging to a new word – achieves
a much better level of accuracy.
The work by Gambell and Yang is exactly the kind of research
I like: attempting to model a psychological process and evaluating results
against empirical data. The insights gained from the modelling take us forward.
The notion that prosody may provide key information in segmenting words seems
entirely plausible. If generative grammarians wish to refer to such a cognitive
bias as part of Universal Grammar, that’s fine with me. As noted in my original piece, I agree that
there must be some constraints on learning; if UG is confined to this kind of
biologically plausible bias, then I am happy with UG. My difficulties arise with
more abstract and complex innate knowledge, such as are involved in parameter
setting (of which, more below).
But, even at this level of word identification, there are
still important differences between my position and the Chomskyan one. First of
all, I’m not as ready as Chomsky to dismiss statistical learning on the basis
of Gambell and Yang’s work. Their model assumed a sequence of syllables was a
word unless it contained a low transitional probability. Its accuracy was so
bad that I suspect it gave a lower level of success than a simpler strategy:
“Assume each syllable is a word.” But
consider another potential strategy for word segmentation in English, which
would be “Assume each syllable is a complete word unless there’s a very high
transitional probability with the next syllable.” I’d like to see a model like
that tested before assuming transitional probability is a useless cue.
Second, Gambell and Yang stay within what I see as a
Chomskyan style of thinking which restricts the range of information available
to the language processor when solving a particular problem. This is parsimonious and makes modelling
tractable, but it’s questionable just how realistic it is. It contrasts sharply
with the view proposed by Seidenberg and MacDonald (1999), who argue that cues
that individually may be poor at solving a categorisation problem, may be much
more effective when used together. For instance, the young child doesn’t just
hear words such as ‘cat’, ‘dog’, ‘lion’, ‘tiger’, ‘elephant’ or ‘crocodile’: she
typically hears them in a meaningful context where relevant toys or pictures
are present. Of course, contextual information is not always available and not
always reliable. However, it seems odd to assume that this contextual
information is ignored when populating the lexicon. This is one of the core
difficulties I have with Chomsky: the sense that meaning is not integrated in
language learning.
Turning to lexical categories, the question is whether
Chomsky would accept that these might be discovered by the child through a
process of statistical learning, rather than being innate. I have understood that he’d rejected this idea, and
have not found any statement by him to suggest otherwise, but others may be
able to point to these. Franck Ramus (4th Sept) argues that children do represent
some syntactic categories well before this is evident in their language and
this is not explained by statistical relationships between words. I’m not convinced by the evidence he cites, which is based on different brain responses
to grammatical and ungrammatical sentences in toddlers (Bernal et al, 2010).
First, the authors state: “Infants could therefore not detect the
ungrammaticality by noticing the co-occurrence of two words that normally never
occur together”. But they don’t present
any information on transitional probabilities in a naturalistic corpus for the
word sequences used in their sentences. All that is needed is for statistical
learning is for the transitional probabilities to be lower in the ungrammatical
than grammatical sentences: they don't have to be zero. Second, the children in this study were two
years old, and would have been exposed to a great deal of language from which
syntactic categories could have been abstracted by mechanisms similar to those
simulated by Redington et al.
Regarding syntax, I was pleased to be introduced to the work
of Jeffrey Lidz, whose clarity of expression is a joy after struggling with
Chomsky. He reiterates a great deal of what I regard as the ‘standard’
Chomskyan view, including the following:
“Speaking broadly, this research generally finds that children’s representations do not differ in kind from those of adults and that in cases where children behave differently from adults, it is rarely because they have the wrong representations. Instead, differences between children and adults are often attributed to task demands (Crain & Thornton, 1998), computational limitations (Bloom,1990; Grodzinsky & Reinhart, 1993), and the problems of pragmatic integration (Thornton & Wexler, 1999) but only rarely to representational differences between children and adults (Radford, 1995; see also Goodluck, this volume).” Lidz, 2008
The studies cited by Lidz as showing that
children’s representations are the same as adults – except for performance
limitations – has intrigued me for many years. As someone who has long been
interested in children’s ability to understand complex sentence structures, I long
ago came to realise that the last thing children usually attend to is syntax:
their performance is heavily influenced by context, pragmatics, particular
lexical items, and memory load. But my response to this observation is very
different from that of the generative linguists. Whereas they strive to devise
tasks that are free of these influences, I came to the conclusion that they play a key part in language acquisition. Again, I find myself in
agreement with Seidenberg and MacDonald (1999):
“The apparent complexity of language and its uniqueness vis a vis other aspects of cognition, which are taken as major discoveries of the standard approach, may derive in part from the fact that these ‘performance’ factors are not available to enter into explanations of linguistic structure. Partitioning language into competence and performance and then treating the latter as a separate issue for psycholinguists to figure out has the effect of excluding many aspects of language structure and use from the data on which the competence theory is developed.” (p 572)
The main problem I have with Chomskyan theory, as I
explained in the original blogpost, is the implausibility of parameter setting
as a mechanism of child language acquisition. In The Science of Language,
Chomsky (2012) is explicit about parameter-setting as an
attractive way out of the impasse created by the failure to find general UG
principles that could account for all languages. Specifically, he says:
“If you’re trying to get Universal Grammar to be articulated and restricted enough so that an evaluation will only have to look at a few examples, given data, because that’s all that’s permitted, then it’s going to be very specific to language, and there aren’t going to be general principles at work. It really wasn’t until the principles and parameters conception came along that you could really see a way in which this could be divorced. If there’s anything that’s right about that, then the format for grammar is completely divorced from acquisition; acquisition will only be a matter of parameter setting. That leaves lots of questions open about what the parameters are; but it means that whatever is left are the properties of language.”
I’m sure readers will point out if I’ve missed anything, but
what I take away from this statement is an admission that UG is now seen as
consisting of very general and abstract constraints on processing that are not
necessarily domain-specific. The
principal component of UG that interests Chomsky is
“an operation that enables you to take mental objects [or concepts of some sort], already constructed, and make bigger mental objects out of them. That’s Merge. As soon as you have that, you have an infinite variety of hierarchically structured expressions [and thoughts] available to you.”
I have no difficulty in agreeing
with the idea that recursion is a key component of language and humans have a capacity for this kind of processing. But Chomsky makes another claim that I find much harder to
swallow. He sees the separation of UG from parameter-setting as a solution to
the problem of acquisition; I see it as just moving the problem elsewhere. For a start, as he himself notes, there are
“a lot of questions open” about what the parameters are. Also, children don’t behave as if parameters
are set one way or another: their language output is more probabilistic. I was
interested to read that modifications of Chomskyan theory have been proposed to
handle this:
“Developing suggestions of Thomas Roeper’s, Yang proposes that UG provides the neonate with the full array of possible languages, with all parameters valued, and that incoming experience shifts the probability distribution over languages in accord with a learning function that could be quite general. At every stage, all languages are in principle accessible, but only for a few are probabilities high enough so that they can actually be used.” (Chomsky, 2005, p. 9).
So not only can the theory can be adapted to handle probabilistic
data; probability now assumes a key role, as it is the factor that decides
which grammar will be adopted at any given point in development. But while I am pleased to see the
probabilistic nature of children’s grammatical structures acknowledged, I still
have problems with this account:
First, it is left unclear why a child opts
for one version of the grammar at time 1 and another at time 2, then back to
the first version at time 3. If we want an account that is explanatory rather than merely descriptive, then non-deterministic behaviour needs explaining. It could reflect the behaviour of a system that
is rule-governed but is affected by noise or it could be a case of different
options being selected according to other local constraints. What seems less
plausible –though not impossible - is a
system that flips from one state to another with a given probability. In a similar vein, if a grammar has an optional
setting on a parameter, just what does that mean? Is there a random generator somewhere in the
system that determines on a moment-by-moment basis what is produced, or are there local factors that constrain
which version is preferred?
Second, this account ignores the fact that early usage of
certain constructions is influenced by the lexical items involved (Tomasello, 2006), raising questions about just how abstract the syntax
is.
Third, I see a clear distinction between saying that a child
has the potential to learn any grammar, and saying that the child has available
all grammars from the outset, “with all parameters valued”. I’m happy to agree
with the former claim (which, indeed, has to be true, for any
typically-developing child), but the latter seems to fly in the face of
evidence that the infant brain is very different from the adult brain, in terms
of number of neurons, proportion of grey and white matter, and
connectivity. It’s hard to
imagine what the neural correlate of a “valued parameter” would be. If the
“full array of languages” is already available in the neonate, then how is it
that a young child can suffer damage to a large section of the left cerebral
hemisphere without necessarily disturbing the ultimate level of language
ability (Bishop, 1988)?
Are there things that only a Chomskyan account can explain?
Progress, of course, is most likely when people do disagree,
and I suspect that some of the psychological work on language acquisition might
not have happened if people hadn’t taken issue with being told that
such-and-such a phenomenon proves that some aspect of language must be
innate. Let me take three such examples:
1. Optional
infinitives. I remember many years ago
hearing Ken Wexler say that children produce utterances such as “him go
there”, and arguing that this cannot have been learned from the input and so
must be evidence of a grammar with an immature parameter-setting. However, as Julian Pine pointed out at the
same meeting, children do hear sequences such as this in sentences such as “I
saw him go there”, and furthermore children’s optional infinitive errors tend
to occur most on verbs that occur relatively frequently as infinitives in
compound finite constructions (Freudenthal et al., 2010).
2. Fronted interrogative verb auxiliaries. This is a classic
case of an aspect of syntax that Chomsky (1971) used as evidence for Poverty of
the Stimulus – i.e., the inadequacy of language input to explain language
knowledge. Perfors et al (2010) take this example and demonstrate that it is
possible to model acquisition without assuming innate syntactic knowledge. I’m
sure many readers would take issue with certain assumptions of the modelling,
but the important point here is not the detail so much as the demonstration
that some assumptions about impossibility of learning are not as watertight as
often assumed: a great deal depends on how you conceptualise the learning
process.
3. Anaphoric ‘one’. Lidz et al (2003) argued that toddlers
aged around 18 months manage to work out the antecedent of the anaphoric
pronoun “one” (e.g. “Here’s a yellow bottle. Can you see another one?”), even
though there was insufficient evidence in their language input to disambiguate
this. The key issue is whether “another one” is taken to mean the whole noun
phrase, “yellow bottle”, or just its
head, “bottle”. Lidz et al note that in
the adult grammar the element “one” typically refers to the whole constituent
“yellow bottle”. To study knowledge of this aspect of syntax in infants, they
used preferential looking: infants were first introduced to a phrase such as
“Look! A yellow bottle”. They were then presented with two objects: one
described by the same adjective+noun combination (e.g. another yellow bottle),
and one with the same noun and a different adjective (e.g. a blue bottle). Crucially, Lidz et al claimed that
18-month-olds would look significantly more often to the yellow (rather than
blue) bottle when asked “Do you see another one?”, i.e., treating “one” as
referring to the whole noun phrase, just like adults. This was not due to any
general response bias, because they showed the opposite bias (preference for
the novel item) if asked a control question “What do you see now?” In
addition Lidz et al analysed data from the CHILDES database and concluded that,
although adults often used the phrase “another one” when talking to young
children, this was seldom in contexts that disambiguated its reference.
This study stimulated
a range of responses from researchers who suggested alternative explanations; I
won’t go into these here, as they are clearly described by Lidz and Waxman
(2004), who go carefully through each one presenting arguments against it. This
is another example of the kind of work I like – it’s how science should
proceed, with claim and counter-claim being tested until we arrive at a resolution. But is the answer clear?
My first reaction to the original study was simply that I’d
like to see it replicated: eleven children per group is a small sample size for
a preferential looking study, and does not seem a sufficiently firm foundation
on which to base the strong conclusion that children know things about syntax
that they could not have learned. But my second reaction is that, even if this
replicates, I would not find the evidence for innate knowledge of grammar
convincing. Again, things look different if you go beyond syntax. Suppose, for
instance, the child interprets “another one” to mean “more”. There is reason to
suspect this may occur, because in the same CHILDES corpora used by Lidz, there
are examples of the child saying things like “another one book”.
On this interpretation, the Lidz task would still pose a challenge,
as the child has to decide whether to treat “another one” as referring to the
specific object (“yellow bottle”), or the class of objects (“bottle”). If the former is correct, then they should
prefer the yellow bottle. If the latter, then there’d be no preference. If
uncertain, we’d expect a mixture of responses, somewhere between these
options. So what was actually
found? As noted above, children given
the control sentence “What do you see now?” there was a slight bias to pick the
new item and so the old item (yellow bottle) was looked at for only an average
of 43% of the time (SD = 0.052). For children asked the key question: “Do you
see another one?” the old item (yellow bottle) was looked at on average 54% of
the time (SD = 0.067). The difference between the two instruction types is
large in statistical terms (Cohen’s d = 1.94), but the bias away from chance is
fairly modest in both cases. If I’m
right and syntax not the most crucial factor for determining responses, then we
might find that the specific test items would affect performance: e.g., a complex noun phrase that describes a
stable entity (e.g. a yellow bottle) might be more likely to be selected for
“another one” than an object in a
transient state (e.g. a happy boy). [N.B. My thanks to Jeffrey Lidz who kindly provided raw data that are the basis of the results presented above].
Points of agreement – and disagreement – between generative
linguists and others
The comments I have received give me hope that there may be
more convergence of views between Chomskyans and those modelling language
acquisition than I had originally thought. The debate between connectionist ‘bottom up’ and Bayesian ‘top
down’ approaches to modelling language acquisition highlighted by Jeff Bowers
(4th Sept) and described by Perfors et al (2011) gets back to basic issues
about how far we need a priori abstract symbolic structures, and how far these
can be constructed from patterned input. I emphasise again that I would not
advocate treating the child as a blank slate. Of course, there need to be
constraints affecting what is attended to and what computations are conducted
on input. I don’t see it as an either (bottom up)/or (top down) problem. The key questions have to do with what
top-down constraints are and how domain-specific they need to be, and just how
far one can go with quite minimal prior specification of structure.
I see these as empirical questions whose answers need to
take into account (a) experimental studies of child language acquisition and
(b) formal modelling of language acquisition using naturalistic corpora as well
as (c) the phenomena described by generative linguists, including intuitive
judgements about grammaticality etc.
I appreciate the patience of David Adjer (Sept 11th) in
trying to argue for more of a dialogue between generative linguists and those
adopting non-Chomskyan approaches to modelling child language. Anon (Sept 4th) has also shown a willingness
to engage that gives me hope that links may be forged between those working in
the classic generative tradition and others who attempt to model language
development. I was pleased to be nudged by Anon (4th Sept) into reading
Becker et al (2011), and agree it is an example of the kind of work that is
needed: looking systematically at known factors that might account for observed
biases, and pushing to see just how much these could explain. It illustrates
clearly that there are generative linguists whose work is relevant for
statistical learning. I still think,
though, that we need to be cautious in concluding there are innate biases, especially when
the data come from adults, whose biases could be learned. There are always
possible factors that weren’t controlled – e.g. in this case I wondered in this
case about age of acquisition effects (cf. data from a very different kind of
task by Garlock et al, 2001). But overall, work like this offers reassurance that not all generative linguists live in a Chomskyan silo - and if I implied that they did, I apologise.
When Chomsky first wrote on this topic, we did not have
either the corpora or the computer technology to simulate naturalistic language
learning. It still remains a daunting task, but I am impressed at what has been
achieved so far. I remain of the view
that the task of understanding language acquisition has been made unduly
difficult by adopting a conceptualisation of what is learned that focuses on
syntax as a formal system that is learned in isolation of context and
meaning. Like Edelman and Waterfall
(2007) I also suspect that obstacles have been created by the need to develop a
‘beautiful’ theory, i.e. one that is simple and elegant in accounting for
linguistic phenomena. My own prediction is that any explanatorily adequate
account of language acquisition will be an ugly construction, cobbled together
from bits and pieces of cognition, and
combining information from many different levels of processing. The test will
ultimately be if we can devise a model that can predict empirical data from
child language acquisition. I probably won’t live long enough, though, to see
it solved.
References
Becker, M., Ketrez, N., & Nevins, A. (2011). The surfeit
of the stimulus: Analytic biases filter lexical statistics in Turkish laryngeal
alternations. Language, 87(1), 84-125.
Bernal, S., Dehaene-Lambertz, G., Millotte, S., &
Christophe, A. (2010). Two-year-olds compute syntactic structure on-line.
Developmental Science, 13(1), 69-76. doi: 10.1111/j.1467-7687.2009.00865.x
Bishop, D. V. M. (1988). Language development after focal
brain damage. In D. V. M. Bishop & K. Mogford (Eds.), Language development
in exceptional circumstances (pp. 203-219). Edinburgh: Churchill Livingstone.
Chomsky, N. (2005). Three factors in language design.
Linguistic Inquiry, 36(1), 1-22.
Edelman, S., & Waterfall, H. (2007). Behavioral and
computational aspects of language and its acquisition. Physics of Life Reviews,
4, 253-277.
Freudenthal, D., Pine, J., & Gobet, F. (2010).
Explaining quantitative variation in the rate of Optional Infinitive errors
across languages: A comparison of MOSAIC and the Variational Learning Model.
Journal of Child Language, 37(3), 643-669. doi: 10.1017/s0305000909990523
Garlock, V. M., Walley, A. C., & Metsala, J. L. (2001).
Age-of-acquisition, word frequency and
neighborhood density effects on spoken word recognition: Implications for the
development of phoneme awareness and early reading ability. Journal of Memory
and Language, 45, 468-492.
Lidz, J., Waxman, S., & Freedman, J. (2003). What
infants know about syntax but couldn't have learned: experimental evidence for
syntactic structure at 18 months. Cognition, 89(3), 295-303.
Lidz, J., & Waxman, S. (2004). Reaffirming the poverty
of the stimulus argument: a reply to the replies. Cognition, 93, 157-165.
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of
abstract syntactic principles. Cognition, 118(3), 306-338. doi:
10.1016/j.cognition.2010.11.001
Perfors, A., Tenebaum, J. B., & Wonnacott, E. (2010).
Variability, negative evidence, and the acquisition of verb argument
constructions. Journal of Child Language, 37(3), 607-642. doi:
http://dx.doi.org/10.1017/S0305000910000012
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The
learnability of abstract syntactic principles. Cognition, 118(3), 306-338. doi:
10.1016/j.cognition.2010.11.001
Redington, M., Chater, N., & Finch, S. (1998).
Distributional information: A powerful cue for acquiring syntactic categories.
Cognitive Science, 22(4), 425-469.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996).
Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928.
Seidenberg, M. S., & MacDonald, M. C. (1999). A
probabilistic constraints approach to language acquisition and processing.
Cognitive Science, 23(4), 569-588.
Tomasello, M. (2006). Acquiring linguistic constructions. In
R. Siegler & D. Kuhn (Eds.), Handbook of child psychology (pp. 1-48):
Oxford University Press.
P.S. 15th October 2012
I have added some links to the response of 12th October. In addition, I have discovered this book, which gives an excellent account of generative vs. constructivist approaches to language acquisition:
Ambridge, B., & Lieven, E. V. M. (2011). Child Language Acquisition - Contrasting Theoretical Approaches: Cambridge University Press.