Monday, 3 September 2012

What Chomsky doesn't get about child language


Noam Chomsky is widely regarded as an intellectual giant, responsible for a revolution in how people think about language.  In a recent book by Chomsky and James McGilvray, the Science of Language, the foreword  states: “It is particularly important to understand Chomsky’s views … not only because he virtually created the modern science of language by himself …. but because of what he and colleagues have discovered about language – particularly in recent years…”  

As someone who works on child language disorders, I have tried many times to read Chomsky in order to appreciate the insights that he is so often credited with. I regret to say that, over the years, I have come to the conclusion that, far from enhancing our understanding of language acquisition, his ideas have led to stagnation, as linguists have gone through increasingly uncomfortable contortions to relate facts about children’s language to his theories. The problem is that the theories are derived from a consideration of adult language, and take no account of the process of development. There is a fundamental problem with an essential premise about what is learned that has led to years of confusion and sterile theorising.
Let us start with Chomsky’s famous sentence "Colourless green ideas sleep furiously". This was used  to demonstrate independence of syntax and semantics: we can judge that this sentence is syntactically well-formed even though it makes no sense. From this, it was a small step to conclude  that language acquisition involves deriving abstract syntactic rules that determine well-formedness, without any reliance on meaning. The mistake here was to assume that an educated adult's ability to judge syntactic well-formedness in isolation has anything to do with how that ability was acquired in childhood. Already in the 1980s, those who actually studied language development found that children used a wide variety of cues, including syntactic, semantic, and prosodic information, to learn language structure (Bates & MacWhinney, 1989).  Indeed, Dabrowska (2010) subsequently showed that agreement on well-formedness of complex sentences was far from universal in adults.
Because he assumed that children were learning abstract syntactic rules from the outset, Chomsky encountered a serious problem. Language, defined this way, was not learnable by any usual learning system: this could be shown by formal proof from mathematical learning theory. The logical problem is that such learning is too unconstrained: any grammatical string of elements is compatible with a wide range of underlying rule systems. The learning becomes a bit easier if children are given negative evidence (i.e., the learner is explicitly told which rules are not correct), but (a) this doesn’t really happen and (b) even if it did, arrival at the correct solution is not feasible without some prior knowledge of the kinds of rules that are allowable. In an oft-quoted sentence, Chomsky (1965) wrote: "A consideration of the character of the grammar that is acquired, the degenerate quality and narrowly limited extent of the available data, the striking uniformity of the resulting grammars, and their independence of intelligence, motivation and emotion state, over wide ranges of variation, leave little hope that much of the structure of the language can be learned by an organism initially uninformed as to its general character." (p. 58) (my italics).
So we were led to the inevitable, if surprising, conclusion that if grammatical structure cannot be learned, it must be innate. But different languages have different grammars. So whatever is innate has to be highly abstract – a Universal Grammar.  And the problem is then to explain how children get from this abstract knowledge to the specific language they are learning. The field became encumbered by creative but highly implausible theories, most notably the parameter-setting account, which conceptualised language acquisition as a process of "setting a switch" for a number of innately-determined parameters (Hyams, 1986). Evidence, though, that children’s grammars actually changed in discrete steps, as each parameter became set, was lacking. Reality was much messier.
Viewed from a contemporary perspective, Chomsky’s concerns about the unlearnability of language seem at best rather dated and at worst misguided. There are two key features in current developmental psycholinguistics that were lacking from Chomsky’s account, both concerning the question of what is learned. First, there is the question of the units of acquisition: for Chomsky, grammar is based on abstract linguistic units such as nouns and verbs, and it was assumed that children operated with these categories. Over the past 15 years, direct evidence has emerged to indicate that children don't start out with awareness of underlying grammatical structure; early learning is word-based, and patterning in the input at the level of abstract elements is something children become aware of as their knowledge increases (Tomasello, 2000).  
Second, Chomsky viewed grammar as a rule-based system that determined allowable sequences of elements. But people’s linguistic knowledge is probabilistic, not deterministic. And there is now a large body of research showing how such probabilistic knowledge can be learned from sequential inputs, by a process of statistical learning. To take a very simple example, if repeatedly presented with a sequence such as ABCABADDCABDAB, a learner will start to be aware of dependencies in the input, i.e. B usually follows A, even if there are some counter-examples. Other types of sequence such as AcB can be learned, where c is an element that can vary (see Hsu & Bishop, 2010, for a brief account). Regularly encountered sequences will then form higher-level units. At the time Chomsky was first writing, learning theories were more concerned with forming of simple associations, either between paired stimuli, or between instrumental acts and outcomes. These theories were not able to account for learning of the complex structure of natural language. However, once language researchers started to think in terms of statistical learning, this led to a reconceptualisation of what was learned, and many of the conceptual challenges noted by Chomsky simply fell away.
Current statistical learning accounts allow us to move ahead and to study the process of language learning. Instead of assuming that children start with knowledge of linguistic categories, categories are abstracted from statistical regularities in the input (see Special Issue 03, Journal of Child Language 2010, vol 37). The units of analysis thus change as the child develops expertise. And, consistent with the earlier writings of Bates and MacWhinney (1989), children's language is facilitated by the presence of correlated cues in the input, e.g., prosodic and phonological cues in combination with semantic context. In sharp contrast to the idea that syntax is learned by a separate modular system divorced from other information, recent research emphasises that the young language learner uses different sources of information together. Modularity emerges as development proceeds.
A statistical learning account does not, however, entail treating the child as a “blank slate”. Developmental psychology has for many years focused on constraints on learning: biases that lead the child to attend to particular features of the environment, or to process these in a particular way. Such constraints will affect how language input is processed, but they are a long way from the notion of a Universal Grammar. And such constraints are not specific to language: they influence, for instance, our ability to perceive human faces, or to group objects perceptually.

It would be rash to assume that all the problems of language acquisition can be solved by adopting a statistical learning approach. And there are still big questions, identified by Chomsky and others – Why don’t other species have syntax? How did language evolve? Is linguistic ability distinct from general intelligence?  But we now have a theoretical perspective that makes sense in terms of what we know about cognitive development and neuropsychology, that has general applicability to many different aspects of language acquisition, which forges links between language acquisition and other types of learning, and leads to testable predictions. The beauty of this approach is that it is amenable both to experimental test and to simulations of learning, so we can identify the kinds of cues children rely on, and the categories that they learn to operate with.

So how does Chomsky respond to this body of work? To find out, I decided to take a look at The Science of Language, which based on transcripts of conversations between Chomsky and James McGilvray between 2004 and 2009. It was encouraging to see from the preface that the book is intended for a general audience and “Professor Chomsky’s contributions to the interview can be understood by all”.  

Well, as “one of the most influential thinkers of our time”, Chomsky fell far short of expectation. Statistical learning and connectionism were not given serious consideration, but were rapidly dismissed as versions of behaviourism that can’t possibly explain language acquisition. As noted by Pullum elsewhere, Chomsky derides Bayesian learning approaches as useless – and at one point claimed that statistical analysis of sequences of elements to find morpheme boundaries “just can’t work” (cf. Romberg & Saffran, 2010). He seemed stuck with his critique of Skinnerian learning and ignorant of how things had changed.
I became interested in not just what Chomsky said, but how he said it.  I’m afraid that despite the reassurances in the foreword, I had enormous difficulty getting through this book. When I read a difficult text, I usually take notes to summarise the main points. When I tried that with the Science of Language, I got nowhere because there seemed no coherent structure. Occasionally an interesting gobbet of information bobbed up from the sea of verbiage, but it did not seem part of a consecutive argument. The style is so discursive that it’s impossible to précis. His rhetorical approach seemed the antithesis of a scientific argument. He made sweeping statements and relied heavily on anecdote.

A stylistic device commonly used by Chomsky is to set up a dichotomy between his position and an alternative, then represent the alternative in a way that makes it preposterous. For instance, his rationalist perspective on language acquisition, which presupposes innate grammar, is contrasted with an empiricist position in which “Language tends to be seen as a human invention, an institution to which the young are inducted by subjecting them to training procedures”.  Since we all know that children learn language without explicit instruction, this parody of the empiricist position has to be wrong.
Overall, this book was a disappointment: one came away with a sense that a lot of clever stuff had been talked about, and much had been confidently asserted, but there was no engagement with any opposing point of view – just disparagement.  And as Geoffrey Pullum concluded, in a review in the Times Higher Education, there was, alas, no science to be seen.

Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In B. MacWhinney & E. Bates (Eds.), The crosslinguistic study of sentence processing (pp. 3-73). Cambridge: Cambridge University Press. Available from:
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, N., & McGilvray, J. (2012). The Science of Language: Interviews with James McGilvray. Cambridge: Cambridge University Press.
Dabrowska, E. (2010). Native v expert intuitions: An empirical study of acceptability judgements. The Linguistic Review, 27, 1-23.
Hsu, H. J., & Bishop, D. V. M. (2010). Grammatical difficulties in children with specific language impairment (SLI): is learning deficient? Human Development, 53, 264-277.
Hyams, N. (1986). Language acquisition and the theory of parameters. Dordrecht: Reidel.
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition Wiley Interdisciplinary Reviews: Cognitive Science, 1 (6), 906-914 DOI: 10.1002/wcs.78
Tomasello, M. (2000). Acquiring syntax is not what you think. In D. V. M. Bishop & L. B. Leonard (Eds.), Speech and Language Impairments in Children: Causes, Characteristics, Intervention and Outcome (pp. 1-15). Hove, UK: Psychology Press.

Correction: 4/9/2010. I had originally cited the wrong reference to Dabrowska (Dabrowska, E. 1997. The LAD goes to school : a cautionary tale for nativists. Linguistics, 35, 735-766). The 1997 paper is concerned with variation in adults' ability to interpret syntactically complex sentences. The 2010 paper cited above focuses on grammaticality judgements.

A far-too-long response to (some) commentators
12th October 2012
One of the nice things about blogging is that it gives an opportunity to get feedback on one’s point of view. I’d like to thank all those who offered comments on what I’ve written here, particularly those who have suggested readings to support the arguments they make. The sheer diversity of views has been impressive, as is the generally polite and scholarly tone of the arguments. I’ve tried to look seriously at the points people have made and I’ve had a fascinating few weeks reading some of the broader literature recommended by commentators.
I quickly realised that I could easily spend several months responding to comments and reading around this area, so I have had to be selective. I’ll steer clear of commenting on Chomsky’s political arguments, which I see as quite a separate issue. Nor am I prepared to engage with those who suggest Chomsky is above criticism, either because he is so famous, or because he’s been around a long time.  Finally, I won’t say more about the views of those who have expressed agreement, or extensions of my arguments – other than to say thanks: this is a weird subject area where all too often people seem scared to speak out for fear of seeming foolish or ignorant. As Anon (4 Sept) says, it can quickly get vitriolic, which is bad for everyone.  But if we at least boldly say what we think, those with different views can either correct us, or develop better arguments. 
I’ll focus in this reply on the main issues that emerged from the discussion: how far is statistical learning compatible with a Chomskyan account, are there things that a non-Chomskyan account simply can’t deal with, and finally, are there points of agreement that could lead to more positive engagement in future between different disciplines?
How compatible is statistical learning with a Chomskyan account?
A central point made by Anon, (3rd Sept/4th Sept), and Chloe Marshall (11th Sept) is that  probabilistic learning is compatible with Chomsky's views. 
This seems to be an absolutely crucial point. If there really is no mismatch between what Chomsky is saying and those who are advocating accounts of language acquisition in terms of statistical learning, then maybe the disagreement is just about terminology and we should try harder to integrate the different approaches. 
It’s clear we can differentiate between different levels of language processing.  For instance, here are just three examples of how statistical learning may be implicated in language learning:

  • The original work by Saffran et al (1996) focused on demonstrating that infants were sensitive to transitional probabilities in syllable strings. It was suggested that this could be a mechanism that was involved in segmenting words from speech input.
  • Redington et al (1998) proposed that information about lexical categories could be extracted from language input by considering sequential co-occurrences of words.
  • Edelman and Waterfall (2007) reviewed evidence that children attend to specific patterns of specific lexical items in their linguistic input, concluding that they first acquire the syntactic patterns of particular words and structures and later generalize information to entire word classes. They went on to describe heuristic methods for uncovering structure in input, using the example of the ADIOS (Automatic DIstillation Of Structure) algorithm. This uses distributional regularities in raw, unannotated corpus data to identify significant co-occurrences, which are used as the basis for distributional classes. Ultimately, ADIOS discovers recursive rule-like patterns that support generalization.

So what does Chomsky make of all of this? I am grateful to Chloe for pointing me to his 2005 paper “Three factors in language design”, which was particularly helpful in tracing the changes in Chomsky’s views over time.
Here’s what he says on word boundaries: 
“In Logical Structure of Linguistic Theory (LSLT; p. 165), I adopted Zellig Harris’s (1955) proposal, in a different framework, for identifying morphemes in terms of transitional probabilities, though morphemes do not have the required beads-on-a-string property. The basic problem, as noted in LSLT, is to show that such statistical methods of chunking can work with a realistic corpus. That hope turns out to be illusory, as has recently been shown by Thomas Gambell and Charles Yang (2003), who go on to point out that the methods do, however, give reasonable results if applied to material that is preanalyzed in terms of the apparently language-specific principle that each word has a single primary stress. If so, then the early steps of compiling linguistic experience might be accounted for in terms of general principles of data analysis applied to representations preanalyzed in terms of principles specific to the language faculty....”
Gambell and Yang don’t seem to have published in the peer-reviewed literature, but I was able to track down four papers by these authors (Gambell & Yang, 2003; Gambell & Yang, 2004; Gambell & Yang, 2005a; Gambell & Yang, 2005b),which all make essentially the same point. They note that a simple rule that treats a low-probability syllabic transition as a word boundary doesn’t work with a naturalistic corpus where a high proportion of words are monosyllabic. However, adding prosodic information – essentially treating each primary stress as belonging to a new word – achieves a much better level of accuracy. 
The work by Gambell and Yang is exactly the kind of research I like: attempting to model a psychological process and evaluating results against empirical data. The insights gained from the modelling take us forward. The notion that prosody may provide key information in segmenting words seems entirely plausible. If generative grammarians wish to refer to such a cognitive bias as part of Universal Grammar, that’s fine with me. As noted in my original piece, I agree that there must be some constraints on learning; if UG is confined to this kind of biologically plausible bias, then I am happy with UG. My difficulties arise with more abstract and complex innate knowledge, such as are involved in parameter setting (of which, more below).
But, even at this level of word identification, there are still important differences between my position and the Chomskyan one. First of all, I’m not as ready as Chomsky to dismiss statistical learning on the basis of Gambell and Yang’s work. Their model assumed a sequence of syllables was a word unless it contained a low transitional probability. Its accuracy was so bad that I suspect it gave a lower level of success than a simpler strategy: “Assume each syllable is a word.”  But consider another potential strategy for word segmentation in English, which would be “Assume each syllable is a complete word unless there’s a very high transitional probability with the next syllable.” I’d like to see a model like that tested before assuming transitional probability is a useless cue.
Second, Gambell and Yang stay within what I see as a Chomskyan style of thinking which restricts the range of information available to the language processor when solving a particular problem. This is parsimonious and makes modelling tractable, but it’s questionable just how realistic it is. It contrasts sharply with the view proposed by Seidenberg and MacDonald (1999), who argue that cues that individually may be poor at solving a categorisation problem, may be much more effective when used together. For instance, the young child doesn’t just hear words such as ‘cat’, ‘dog’, ‘lion’, ‘tiger’, ‘elephant’ or ‘crocodile’: she typically hears them in a meaningful context where relevant toys or pictures are present. Of course, contextual information is not always available and not always reliable. However, it seems odd to assume that this contextual information is ignored when populating the lexicon. This is one of the core difficulties I have with Chomsky: the sense that meaning is not integrated in language learning. 
Turning to lexical categories, the question is whether Chomsky would accept that these might be discovered by the child through a process of statistical learning, rather than being innate. I have understood that he’d rejected this idea, and have not found any statement by him to suggest otherwise, but others may be able to point to these. Franck Ramus (4th Sept) argues that children do represent some syntactic categories well before this is evident in their language and this is not explained by statistical relationships between words. I’m not convinced by the evidence he cites, which is based on different brain responses to grammatical and ungrammatical sentences in toddlers (Bernal et al, 2010). First, the authors state: “Infants could therefore not detect the ungrammaticality by noticing the co-occurrence of two words that normally never occur together”. But they don’t present any information on transitional probabilities in a naturalistic corpus for the word sequences used in their sentences. All that is needed is for statistical learning is for the transitional probabilities to be lower in the ungrammatical than grammatical sentences: they don't have to be zero.  Second, the children in this study were two years old, and would have been exposed to a great deal of language from which syntactic categories could have been abstracted by mechanisms similar to those simulated by Redington et al.
Regarding syntax, I was pleased to be introduced to the work of Jeffrey Lidz, whose clarity of expression is a joy after struggling with Chomsky. He reiterates a great deal of what I regard as the ‘standard’ Chomskyan view, including the following:
Speaking broadly, this research generally finds that children’s representations do not differ in kind from those of adults and that in cases where children behave differently from adults, it is rarely because they have the wrong representations. Instead, differences between children and adults are often attributed to task demands (Crain & Thornton, 1998), computational limitations (Bloom,1990; Grodzinsky & Reinhart, 1993), and the problems of pragmatic integration (Thornton & Wexler, 1999) but only rarely to representational differences between children and adults (Radford, 1995; see also Goodluck, this volume).” Lidz, 2008
The studies cited by Lidz as showing that children’s representations are the same as adults – except for performance limitations – has intrigued me for many years. As someone who has long been interested in children’s ability to understand complex sentence structures, I long ago came to realise that the last thing children usually attend to is syntax: their performance is heavily influenced by context, pragmatics, particular lexical items, and memory load. But my response to this observation is very different from that of the generative linguists. Whereas they strive to devise tasks that are free of these influences, I came to the conclusion that they play a key part in language acquisition.  Again, I find myself in agreement with Seidenberg and MacDonald (1999):
The apparent complexity of language and its uniqueness vis a vis other aspects of cognition, which are taken as major discoveries of the standard approach, may derive in part from the fact that these ‘performance’ factors are not available to enter into explanations of linguistic structure. Partitioning language into competence and performance and then treating the latter as a separate issue for psycholinguists to figure out has the effect of excluding many aspects of language structure and use from the data on which the competence theory is developed.” (p 572)
The main problem I have with Chomskyan theory, as I explained in the original blogpost, is the implausibility of parameter setting as a mechanism of child language acquisition. In The Science of Language, Chomsky (2012) is explicit about  parameter-setting as an attractive way out of the impasse created by the failure to find general UG principles that could account for all languages.  Specifically, he says:
If you’re trying to get Universal Grammar to be articulated and restricted enough so that an evaluation will only have to look at a few examples, given data, because that’s all that’s permitted, then it’s going to be very specific to language, and there aren’t going to be general principles at work. It really wasn’t until the principles and parameters conception came along that you could really see a way in which this could be divorced. If there’s anything that’s right about that, then the format for grammar is completely divorced from acquisition; acquisition will only be a matter of parameter setting. That leaves lots of questions open about what the parameters are; but it means that whatever is left are the properties of language.”
I’m sure readers will point out if I’ve missed anything, but what I take away from this statement is an admission that UG is now seen as consisting of very general and abstract constraints on processing that are not necessarily domain-specific. The principal component of UG that interests Chomsky is 
an operation that enables you to take mental objects [or concepts of some sort], already constructed, and make bigger mental objects out of them.  That’s Merge. As soon as you have that, you have an infinite variety of hierarchically structured expressions [and thoughts] available to you.” 
I have no difficulty in agreeing with the idea that recursion is a key component of language and  humans have a capacity for this kind of processing. But Chomsky makes another claim that I find much harder to swallow. He sees the separation of UG from parameter-setting as a solution to the problem of acquisition; I see it as just moving the problem elsewhere.  For a start, as he himself notes, there are “a lot of questions open” about what the parameters are.  Also, children don’t behave as if parameters are set one way or another: their language output is more probabilistic. I was interested to read that modifications of Chomskyan theory have been proposed to handle this:
Developing suggestions of Thomas Roeper’s, Yang proposes that UG provides the neonate with the full array of possible languages, with all parameters valued, and that incoming experience shifts the probability distribution over languages in accord with a learning function that could be quite general. At every stage, all languages are in principle accessible, but only for a few are probabilities high enough so that they can actually be used.” (Chomsky, 2005, p. 9).
So not only can the theory can be adapted to handle probabilistic data; probability now assumes a key role, as it is the factor that decides which grammar will be adopted at any given point in development.  But while I am pleased to see the probabilistic nature of children’s grammatical structures acknowledged, I still have problems with this account:
First, it is left unclear why a child opts for one version of the grammar at time 1 and another at time 2, then back to the first version at time 3. If we want an account that is explanatory rather than merely descriptive, then non-deterministic behaviour needs explaining. It could reflect the behaviour of a system that is rule-governed but is affected by noise or it could be a case of different options being selected according to other local constraints. What seems less plausible –though not impossible -  is a system that flips from one state to another with a given probability. In a similar vein,  if a grammar has an optional setting on a parameter, just what does that mean?  Is there a random generator somewhere in the system that determines on a moment-by-moment basis what is produced,  or are there local factors that constrain which version is preferred?
Second, this account ignores the fact that early usage of certain constructions is influenced by the lexical items involved (Tomasello, 2006), raising questions about just how abstract the syntax is.
Third, I see a clear distinction between saying that a child has the potential to learn any grammar, and saying that the child has available all grammars from the outset, “with all parameters valued”. I’m happy to agree with the former claim (which, indeed, has to be true, for any typically-developing child), but the latter seems to fly in the face of evidence that the infant brain is very different from the adult brain, in terms of number of neurons, proportion of grey and white matter, and connectivity.  It’s hard to imagine what the neural correlate of a “valued parameter” would be. If the “full array of languages” is already available in the neonate, then how is it that a young child can suffer damage to a large section of the left cerebral hemisphere without necessarily disturbing the ultimate level of language ability (Bishop, 1988)?
Are there things that only a Chomskyan account can explain?
Progress, of course, is most likely when people do disagree, and I suspect that some of the psychological work on language acquisition might not have happened if people hadn’t taken issue with being told that such-and-such a phenomenon proves that some aspect of language must be innate.  Let me take three such examples:
1. Optional infinitives.  I remember many years ago hearing Ken Wexler say that children produce utterances such as “him go there”, and arguing that this cannot have been learned from the input and so must be evidence of a grammar with an immature parameter-setting.  However, as Julian Pine pointed out at the same meeting, children do hear sequences such as this in sentences such as “I saw him go there”, and furthermore children’s optional infinitive errors tend to occur most on verbs that occur relatively frequently as infinitives in compound finite constructions (Freudenthal et al., 2010).
2. Fronted interrogative verb auxiliaries. This is a classic case of an aspect of syntax that Chomsky (1971) used as evidence for Poverty of the Stimulus – i.e., the inadequacy of language input to explain language knowledge. Perfors et al (2010) take this example and demonstrate that it is possible to model acquisition without assuming innate syntactic knowledge. I’m sure many readers would take issue with certain assumptions of the modelling, but the important point here is not the detail so much as the demonstration that some assumptions about impossibility of learning are not as watertight as often assumed: a great deal depends on how you conceptualise the learning process.
3. Anaphoric ‘one’. Lidz et al (2003) argued that toddlers aged around 18 months manage to work out the antecedent of the anaphoric pronoun “one” (e.g. “Here’s a yellow bottle. Can you see another one?”), even though there was insufficient evidence in their language input to disambiguate this. The key issue is whether “another one” is taken to mean the whole noun phrase, “yellow bottle”,  or just its head, “bottle”. Lidz et al note that in the adult grammar the element “one” typically refers to the whole constituent “yellow bottle”. To study knowledge of this aspect of syntax in infants, they used preferential looking: infants were first introduced to a phrase such as “Look! A yellow bottle”. They were then presented with two objects: one described by the same adjective+noun combination (e.g. another yellow bottle), and one with the same noun and a different adjective (e.g. a blue bottle).  Crucially, Lidz et al claimed that 18-month-olds would look significantly more often to the yellow (rather than blue) bottle when asked “Do you see another one?”, i.e., treating “one” as referring to the whole noun phrase, just like adults. This was not due to any general response bias, because they showed the opposite bias (preference for the novel item) if asked a control question “What do you see now?” In addition Lidz et al analysed data from the CHILDES database and concluded that, although adults often used the phrase “another one” when talking to young children, this was seldom in contexts that disambiguated its reference.
This study stimulated a range of responses from researchers who suggested alternative explanations; I won’t go into these here, as they are clearly described by Lidz and Waxman (2004), who go carefully through each one presenting arguments against it. This is another example of the kind of work I like – it’s how science should proceed, with claim and counter-claim being tested until we arrive at a resolution. But is the answer clear?
My first reaction to the original study was simply that I’d like to see it replicated: eleven children per group is a small sample size for a preferential looking study, and does not seem a sufficiently firm foundation on which to base the strong conclusion that children know things about syntax that they could not have learned. But my second reaction is that, even if this replicates, I would not find the evidence for innate knowledge of grammar convincing. Again, things look different if you go beyond syntax. Suppose, for instance, the child interprets “another one” to mean “more”. There is reason to suspect this may occur, because in the same CHILDES corpora used by Lidz, there are examples of the child saying things like “another one book”.   
On this interpretation, the Lidz task would still pose a challenge, as the child has to decide whether to treat “another one” as referring to the specific object (“yellow bottle”), or the class of objects (“bottle”). If the former is correct, then they should prefer the yellow bottle. If the latter, then there’d be no preference. If uncertain, we’d expect a mixture of responses, somewhere between these options. So what was actually found?  As noted above, children given the control sentence “What do you see now?” there was a slight bias to pick the new item and so the old item (yellow bottle) was looked at for only an average of 43% of the time (SD = 0.052). For children asked the key question: “Do you see another one?” the old item (yellow bottle) was looked at on average 54% of the time (SD = 0.067). The difference between the two instruction types is large in statistical terms (Cohen’s d = 1.94), but the bias away from chance is fairly modest in both cases.  If I’m right and syntax not the most crucial factor for determining responses, then we might find that the specific test items would affect performance:  e.g., a complex noun phrase that describes a stable entity (e.g. a yellow bottle) might be more likely to be selected for “another one”  than an object in a transient state (e.g. a happy boy). [N.B. My thanks to Jeffrey Lidz who kindly provided raw data that are the basis of the results presented above].
Points of agreement – and disagreement – between generative linguists and others
The comments I have received give me hope that there may be more convergence of views between Chomskyans and those modelling language acquisition than I had originally thought. The debate between  connectionist ‘bottom up’ and Bayesian ‘top down’ approaches to modelling language acquisition highlighted by Jeff Bowers (4th Sept) and described by Perfors et al (2011) gets back to basic issues about how far we need a priori abstract symbolic structures, and how far these can be constructed from patterned input. I emphasise again that I would not advocate treating the child as a blank slate. Of course, there need to be constraints affecting what is attended to and what computations are conducted on input. I don’t see it as an either (bottom up)/or (top down) problem.  The key questions have to do with what top-down constraints are and how domain-specific they need to be, and just how far one can go with quite minimal prior specification of structure.
I see these as empirical questions whose answers need to take into account (a) experimental studies of child language acquisition and (b) formal modelling of language acquisition using naturalistic corpora as well as (c) the phenomena described by generative linguists, including intuitive judgements about grammaticality etc. 
I appreciate the patience of David Adjer (Sept 11th) in trying to argue for more of a dialogue between generative linguists and those adopting non-Chomskyan approaches to modelling child language.  Anon (Sept 4th) has also shown a willingness to engage that gives me hope that links may be forged between those working in the classic generative tradition and others who attempt to model language development.  I was pleased to be nudged by Anon (4th Sept) into reading Becker et al (2011), and agree it is an example of the kind of work that is needed: looking systematically at known factors that might account for observed biases, and pushing to see just how much these could explain. It illustrates clearly that there are generative linguists whose work is relevant for statistical learning. I still think, though, that we need to be cautious in concluding there are innate biases, especially when the data come from adults, whose biases could be learned. There are always possible factors that weren’t controlled – e.g. in this case I wondered in this case about age of acquisition effects (cf. data from a very different kind of task by Garlock et al, 2001).  But overall, work like this offers reassurance that not all generative linguists live in a Chomskyan silo - and if I implied that they did, I apologise.
When Chomsky first wrote on this topic, we did not have either the corpora or the computer technology to simulate naturalistic language learning. It still remains a daunting task, but I am impressed at what has been achieved so far.  I remain of the view that the task of understanding language acquisition has been made unduly difficult by adopting a conceptualisation of what is learned that focuses on syntax as a formal system that is learned in isolation of context and meaning.  Like Edelman and Waterfall (2007) I also suspect that obstacles have been created by the need to develop a ‘beautiful’ theory, i.e. one that is simple and elegant in accounting for linguistic phenomena. My own prediction is that any explanatorily adequate account of language acquisition will be an ugly construction, cobbled together from  bits and pieces of cognition, and combining information from many different levels of processing. The test will ultimately be if we can devise a model that can predict empirical data from child language acquisition. I probably won’t live long enough, though, to see it solved.
Becker, M., Ketrez, N., & Nevins, A. (2011). The surfeit of the stimulus: Analytic biases filter lexical statistics in Turkish laryngeal alternations. Language, 87(1), 84-125.
Bernal, S., Dehaene-Lambertz, G., Millotte, S., & Christophe, A. (2010). Two-year-olds compute syntactic structure on-line. Developmental Science, 13(1), 69-76. doi: 10.1111/j.1467-7687.2009.00865.x
Bishop, D. V. M. (1988). Language development after focal brain damage. In D. V. M. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 203-219). Edinburgh: Churchill Livingstone.
Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry, 36(1), 1-22.
Edelman, S., & Waterfall, H. (2007). Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews, 4, 253-277.
Freudenthal, D., Pine, J., & Gobet, F. (2010). Explaining quantitative variation in the rate of Optional Infinitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language, 37(3), 643-669. doi: 10.1017/s0305000909990523
Garlock, V. M., Walley, A. C., & Metsala, J. L. (2001). Age-of-acquisition, word frequency   and neighborhood density effects on spoken word recognition: Implications for the development of phoneme awareness and early reading ability. Journal of Memory and  Language, 45, 468-492.
Lidz, J., Waxman, S., & Freedman, J. (2003). What infants know about syntax but couldn't have learned: experimental evidence for syntactic structure at 18 months. Cognition, 89(3), 295-303.
Lidz, J., & Waxman, S. (2004). Reaffirming the poverty of the stimulus argument: a reply to the replies. Cognition, 93, 157-165. 
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of abstract syntactic principles. Cognition, 118(3), 306-338. doi: 10.1016/j.cognition.2010.11.001
Perfors, A., Tenebaum, J. B., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37(3), 607-642. doi:
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of abstract syntactic principles. Cognition, 118(3), 306-338. doi: 10.1016/j.cognition.2010.11.001
Redington, M., Chater, N., & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4), 425-469.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928.
Seidenberg, M. S., & MacDonald, M. C. (1999). A probabilistic constraints approach to language acquisition and processing. Cognitive Science, 23(4), 569-588.

Tomasello, M. (2006). Acquiring linguistic constructions. In R. Siegler & D. Kuhn (Eds.), Handbook of child psychology (pp. 1-48): Oxford University Press.
P.S. 15th October 2012
I have added some links to the response of 12th October. In addition, I have discovered this book, which gives an excellent account of generative vs. constructivist approaches to language acquisition:
Ambridge, B., & Lieven, E. V. M. (2011). Child Language Acquisition - Contrasting Theoretical Approaches: Cambridge University Press.



  1. I am afraid, I have seen a simililar narrative about Chomsky in many places, especially his unsophisticated (read, non-probabilistic) approach to language acquisition, yet I am less that convinced by it.

    Here are my reasons for it:
    1) Just a reading of his 1955 dissertation will allow the reader to infer that he in fact suggested statistical learning in language acquisition in it(especially of word-learning). Although, he was somewhat circumspect about its possible success. (Section 1 of this paper has a very nice, but brief, summary of this -

    2) I have in other places, seen him give positive reviews of probabilistic work, especially of Charles Yang's work if I remember correctly.

    3) This paper, while slightly old from current perspectives, is still very sophisticated. (Chomsky, N. Three models for the description of language. IRE Transactions on Information Theory 2 (1956), 113--124.)

    4) He himself has done some information theoretic stuff, and if I remember correctly, was rather disillusioned by the results.

    I can see someone disagree with his viewing of the problem and his suggested solutions. Especially his separation of competence/performance is something many non-linguists are uncomfortable with. However, to call his views dated would be to not do justice to the facts.

    I agree with you that his comments of bayesian approaches at the UCL lecture were somewhat surprising, but given his track-record on the issues, I am far more wont to take a charitable interpretation of his words - something to the effect of "statistical/probabilistic methods when uninformed by theoretical linguistic constructs have been a failure".

    I agree mine is as much a narrative as yours, but I am afraid I can't see how one could maintain what you said given what he has said in print.

    As far as him being stuck on Skinnerian learning and of how things had changed, I have to say I think the world is stuck on his critique of Skinner, not him. He rarely if ever refers to it in his work in print. Most of his discussions of the issue seem to be in response to a question in an interview or talk. In fact, I am not sure he has discussed Skinner at length in print past the late 1950's, perhaps early 1960's.

    1. Anonymous (You really don’t need to be):
      I’ve come across similar ripostes to critiques of Chomsky before: i.e. your arguments don’t stand because you haven’t read everything he has ever written.
      This just isn’t good enough. For a start, here’s a direct quote from Science of Language – specifically on connectionism, which is one approach to probabilistic learning:
      “…connectionism seems to me about the level of corpuscularianism in physics. ….They’re abstracting radically from the physical reality, and who knows if the abstractions are going in the right direction? But, like any other proposal, you evaluate it in terms of its theoretical achievements and empirical consequences. It happens to be quite easy in this case, because they’re almost non-existent.” And later: “The learning thesis is a variation on behaviourism and old-style associationism”
      (Sorry, can’t give page refs, as this is on Kindle)
      There’s quite a lot more in this vein.
      I’m not surprised that you can provide cases where he has expressed different views. Much of what he says is opaque and allusive and not always internally consistent. I’m really not interested in the odd sentence here or there. I’m interested in whether or not he accepts that his view of what children learn has been seriously challenged, and I’d like to know what defence he offers to the challenge. I see no evidence that he has engaged with the literature on statistical learning at any kind of serious level. This despite the fact that this is a burgeoning area of research.
      But there’s a much more serious point than what Chomsky has or has not said in print. My point is that if he were to accept that language learning starts by abstracting phonological regularities from speech input (aided by contextual cues), identifying probabilistic patterns that correspond to morphemes, and only later detecting regularities that correspond to conventional word classes, then the whole rationale for his approach would be undermined. The arguments that language is not learnable evaporate. With them goes the need to postulate Universal Grammar. The whole problem of language acquisition that was tortuously addressed by inventing parameter setting is no longer a problem. There is therefore a very good reason why Chomsky does not engage with this literature. If you take it seriously, decades of work in developmental linguistics in the Chomskyan tradition become an irrelevance.

    2. I'm afraid I have to disagree with this claim.

      "My point is that if he were to accept that language learning starts by abstracting phonological regularities from speech input (aided by contextual cues), identifying probabilistic patterns that correspond to morphemes, and only later detecting regularities that correspond to conventional word classes, then the whole rationale for his approach would be undermined. The arguments that language is not learnable evaporate. With them goes the need to postulate Universal Grammar."

      I agree with almost everything in the first sentence, except the inference. Nowhere in the input is there anything that suggests what to keep track of, what abstractions/generalisations one makes. To be more specific, there are a lot of statistical facts/correlations in the linguistic input, yet people aren't generalising from all of them. The set of generalisations made is a subset of those that have statistical support. Note, this is very similar to Chomsky's poverty of stimulus argument. (refs:

      On his opacity and general incoherence, this is clearly a subjective opinion. There are surely many amongst the probabilistically sophisticated, who might disagree with some of his views surely, but would beg to differ from your general assessment.

  2. Charlie Wilson @crewilson3 September 2012 15:34

    I have to say, these criticisms of Chomsky, in less well put form, have regularly occurred to me, in the course of several goes at trying to understand his work on language. I gave up trying, thinking that I must be missing something or too dumb to understand it all.

    My feeling is that the contrast between the two comments above nicely captures the problem with him and his body of work, written and elsewhere - he seems to regularly contradict himself, or at best remain frustratingly allusive (and elusive) when it comes to the major controversies. I can certainly say that the same pattern can be seen in his writing on political issues as well, and interestingly a brief trip to google tells me that plenty of people have even written about the presence of an inherent contradiction between his views in his two main areas of politics and linguistics.

    I wonder what other areas (perhaps of psychology / neuroscience) are dominated by views of major figures who don't take into account the way in which something is thought to have developed?

  3. For those interested in pursuing the debate further, I have just been pointed to this site which has many relevant references to critics of Chomsky

  4. The brilliant child psychiatrist, Stanley Greenspan, MD, ended the debate with publication of The First Idea, How symbols, language, and intelligence evolved from our primate ancestors to modern humans. Greenspan writes in his classic book that the relationship between infant and adult is much more complex than previously thought and the development process is the key to language and intelligence. Greenspan also ends the Cartesian concept of reason, elucidating that thought and intelligence are emotion oriented. You must have emotion to have intelligence.

    I believe Dr. Greenspan puts the argument to rest and shows that Chomsky is out of date.

  5. People might find Christina Behme's (extremely negative) review of this interesting:

    Where I think Chomsky went wrong was not in the idea that children learn some (semi-)autonomous syntax, which they do after all seem to wind up in possession of eventually, but rather in having the wrong notion of 'fit', as explored in the apparently exploding field of Bayesian Syntax learning (Mike Dowman, Anne Hsu, Hick Chater, Paul Vitanyi, Lisa Pearl and Amy Perfors being some of the more productive recent contributors).

  6. Thanks to all for comments.
    Anonymous: Have you read any of the papers in the Special Issue of Journal of Child Language that I referenced in my post? Alternatively, this paper tackles many of the issues about how far grammar can be learned from input: Edelman, S., & Waterfall, H. (2007). Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews, 4, 253-277.
    Avery: Thanks for the reference to Christina Behme. There's a more direct link to her paper here:
    I would not like my exasperation with Chomsky’s rhetorical style to sidetrack us from a consideration of his arguments. The main point I want to stress is not just that Science of Language is unscholarly in its style (as argued also by Behme and Pullum) but that the fundamental premise of Chomsky’s theorising is wrong. In some ways I am more extreme than other critics: I am not just arguing for an alternative approach to child language acquisition, I’m also saying that we might have made much more progress 40 years ago if Chomsky had not blighted the field with his theorising. Children clearly do learn grammars which can be expressed in terms of sequences of abstract lexical categories: but (a) this knowledge is probabilistic and not deterministic and (b) knowledge of lexical categories is not something that they start out with.

    1. I repeat that I agree with you where you are factual. On your two points, I have shown you and can shown many other instances that (a) is perfectly compatible with both Chomsky's views and modern generative theories. (b) is perhaps more debated in generative theorising, but if you are talking only about Chomsky, he himself has questioned the need for lexical categories in the traditional sense (even at the adult grammar level) in a whole chapter level discussion (Ch. 4) in The Minimalist Program, written in 1995. It is not a big leap to assume that he would perhaps say the same about children.

      Again, very little of the claims about UG you made earlier follow from the data you show. It is like arguing that people need to learn morphemes, so everything is learnt. Just because somethings are learnt doesn't mean everything is. [Note: I am using "learn(ing)" with the meaning that you imply - knowledge of something that was not there before.]

      All I can see is that you are making as big a leap of logic that you claim Chomsky (and perhaps other Chomskyans) makes.

      I think there is a need to honestly acknowledge that both sides of the debate are bringing their set of biases to the table. And unfortunately, neither side is honest about it, and is instead wrangled in rhetorical posing, and even misdirection. Neither side attempts a full-faith attempt to understand what the other side is saying. In most places, there is more consensus than the rhetoric will let you believe. All because people highlight their preferred assumptions, and push others under the carpet.

      The debate itself has been good in my opinion in forcing people to do innovative work. However, the vitriolic rhetoric has been horrible.

      Full disclosure: I haven't read all the papers of the issue you refer to. But, I am guessing you hadn't read the Becker et al paper I mentioned too. So, I am not sure where that question was going. We are coming to the discussion with different backgrounds, clearly.

    2. Apologies for indulging to some extent in the same rhetoric that I derided.

      Thanks for the interesting discussion and the references!

      I guess I will stop before it becomes another of those unhealthy discussions.

  7. Two comments, one concerning the link between Chomsky’s linguistics and politics, and one about his linguistics.

    In one of the replies above, Charlie Wilson writes that there are similarities between Chomsky’s politics and his linguistics, both suffering from internal inconsistencies, and both hard to follow. But what is so striking about Chomsky’s politics is that they are almost completely untheoretical, and very easy to understand. He points out self-evident truisms (e.g., we should apply the same standards to ourselves as we do to others) and cites lots of lots of data (much of it that you have never heard of before) to make very straightforward conclusions. One of the common claims is that he is inconsistent in that he criticizes the US more than other states. Here is his very straightforward reply.

    "My own concern is primarily the terror and violence carried out by my own state, for two reasons. For one thing, because it happens to be the larger component of international violence. But also for a much more important reason than that: namely, I can do something about it. So even if the US was responsible for 2% of the violence in the world instead of the majority of it, it would be that 2% I would be primarily responsible for. And that is a simple ethical judgment. That is, the ethical value of one's actions depends on their anticipated and predictable consequences. It is very easy to denounce the atrocities of someone else. That has about as much ethical value as denouncing atrocities that took place in the 18th century."

    So, however bad this book his, and however bad his linguistics, it
    provides no basis for questioning his politics.

    What about his linguistics? Given Dorothy’s comments, I’ll not be buying the book. But I do think Chomsky may be right about one important point that was not emphasized enough. Chomsky’s main point against Skinner was that associationism is not enough to explain
    language, and to the extent that statistical learning is implemented
    by connectionist networks associated with the PDP camp, I think his critique still applies. This is also the view of people like Pinker, who rejects “eliminative connectionism” (networks that eliminate
    symbols) in favour of neural networks that include symbols. And why Hummel Develops “symbolic networks”, as opposed to PDP networks. These authors have are far from Chomskyians (Pinker seems to dislike a lot about Chomsky, including his politics), but have adopted one of his central tenants that you need symbols and rules, not only

    It is also worth noting that Bayesian theorists are often very much in
    sympathy with the importance of abstract symbols, which puts them at
    odds with the PDP camp (see recent debate in TICS). Here is a quote
    from a paper I just downloaded from Tenenbaum’s website that highlights how some key Bayesian theorists are not just adopting another form of statistics learning – they are adopting a key part of the non-associationist position:

    The learnability of abstract syntactic principles. Perfors, A,
    Tenenbaum, J. B. and Regier, T. (in press). Cognition.

    “Second, our approach offers a way to tease apart two fundamental
    dimensions of linguistic knowledge that are often conflated in the
    language acquisition literature. The question of whether human
    learners have (innate) language-specific knowledge is logically
    separable from the question of whether and to what extent human
    linguistic knowledge is based on structured symbolic representations
    like generative phrase-structure grammars. Few…cognitive scientists have explored the possibility that explicitly structured mental representations might be constructed or learned via domain-general learning mechanisms.

    That said, I would point out that I’m not Bayesian!

  8. hi Dorothy,
    For once I will disagree to a large extent (although not entirely). This would deserve a whole paper, but here are a few succinct points:
    - the idea that children's language (and even worse, their linguistic awareness skills) can provide any meaningful measure of their linguistic knowledge is ridiculous, and as far as I know has been rejected long ago. See also my more recent discussion of child phonology:
    - there is in fact evidence that young children represent some syntactic categories well before their language can show any sign of that (let alone awareness), and without this being attributable to mere statistical relationships between words. See for instance:
    - more generally, while statistical learning is certainly much more important than Chomsky will admit (including for learning and use of an abstract grammar), I haven't seen any meaningful theory of language acquisition solely based on statistical learning. Tomasello's has very low explanatory power and largely relies on unproven magic.
    - what sense does it make to judge Chomsky's ideas on the basis of transcriptions of interviews?
    - at the same time, I agree that he has become totally unreadable for anybody working outside the minimalist program, and I dislike his dismissive attitude to basically everything else. But this is quite independent from the broader framework that he has set, which may remain very fruitful, and despite falling short of many of its promises, does not currently have any credible competitor in my opinion.
    - an example of work that I find more credible is that of Paul Smolensky, who does realistic statistics-based connectionist modelling, while keeping with the standard notion of an abstract grammar being acquired by the child, and trying to explain how this can happen for real. This is the sort of direction where I would place my bets (but not a cent on Tomasello!).

  9. Thanks for writing this post. But you don't go far enough. Chomsky is not only wrong about language learnability. He is profoundly wrong about language. I wrote a post a few years back that I don't consider him to be a very good linguist:

    But I don't think I went far enough - particularly in light of this recent hagiography. The bit about discovering more about language in the last 20 years than in all of previous history appeared somewhere else earlier and it just made me laugh. All it means is they were able to refine GB with minimalism to work out the kinks in the theory. No new insights on language were generated during that process.

    Normally, I would just ignore Chomsky (as most sensible linguists seem to do these days) but his effect has been particularly pernicious in the applied sciences. I constantly have to correct language teachers that nothing they ever come in contact with is in any way impacted by what Chomsky and his acolytes have to say about learnability. Not vocabulary, not pronunciation, not spelling! The problem is that the Chomskean argument for Universal Grammar is so technical and hard to understand for anyone who is not actually immersed in it, that a cursory look will just result in a complete distortion. My advice to language practitioners is simply to ignore Chomsky in their work.

    I have no doubt that he is wrong about pretty much everything, and I am convinced that in 50 years once his political star has faded, he will be seen as a blip on the linguistic scene and Universal Grammar will have the credibility phrenology has today. But even if I and all his other critics are wrong and Chomsky's been right all along, it won't change the fact, that what he has to say is completely irrelevant to what language teachers do, speech therapists do or even language acquisition researchers do. It has nothing to say about acquiring knowledge of the world so essential to using language, register, bilingualism, diglossia, language change, language death, language politics, literature, poetry, figurative language, sociolinguistic variation, and so on.

    We can have all the arguments about research evidence and combinatorics. I just normally ask the Universal Grammar advocates to name off the top of their head 20 principles and explain the process through which the right parameters are set for their instantiation in my knowledge of language required to produce this sentence. I have yet to receive an answer. They have 50 years to prove me wrong.

  10. Sorry, I posted my screed before Franck Ramus commented, otherwise I would have inorporated a response into the first comment.

    I think the syntactic categories problem assumes a stable set of categories that cannot simply be substantiated by analysis of actual language in actual use. Which is also why the statistical learning alternative is a dead end. It provides a useful model for language but ignores too many inputs along the way to actually be language works.

    Franck comments that the approach continues to be fruitful but I would ask how? To say it has underdelivered is a massive understatement. It is not even used in things like machine translation that much anymore which is where it once showed great promise. I cannot think of a single thing that minimalism has achieved that is relevant outside the generative paradigm. But maybe I'm missing something big

    I always found Thomasello's and Dabrowska's work very compelling and frankly I don't see the magic required to make it work. Most importantly, it seems consistent with how we learn most of what we know about the world. However, the magic of the Language Acquisition Device has always boggled my mind. It requires a level of modularity of mind that can surely be hard to sustain in the face of so much messiness coming from all the data points.

  11. I feel out of depth commenting on this article. Is there a guide to the statistical approach for a lay-person? Particularly on this idea of the being no (solid) syntactic categories, and the (not so) fully formed grammar arising from statistical learning.

    I can believe UG is unhelpful for figuring out how a child goes about learning a language, and that statistical learning may be used for learning words and early phrases. I thought though (and this is where understanding arguments about syntactic categories would help) that the fully formed grammar is too complex. How does this approach explain the final state attained? Is there still a generative procedure?

  12. I agree with your general tenor re. the thrall that Chomsky seems to hold over at least some parts of linguistics (although it’s probably not his fault that people take what he says quite so seriously). However, I wondered whether your comments re. the learnability issue were quite right – formal learnability really isn’t my area, so no doubt better-informed people can correct me.

    My concern stems from when you say “Because he assumed that children were learning abstract syntactic rules from the outset, Chomsky encountered a serious problem. Language, defined this way, was not learnable by any usual learning system: this could be shown by formal proof from mathematical learning theory.”. I presume here you’re referring to Gold’s Theorem (Gold, 1967). My understanding is that Gold’s result applies to any set of languages where you have an infinitely nested set of languages stacked one inside the other, like Russian dolls – that might apply to languages which can be described by abstract syntactic rules, but abstract syntactic rules aren’t required for Gold’s argument to work. In other words, the learnability problem doesn’t follow solely from the idea that children learn abstract syntactic rules.

    You then go on to talk about how a reconceptualisation of what is learned solves the learnability problem (again, presumably referring back to Gold), in terms of whether rules develop from specific to abstract, and whether grammatical representations are probabilistic or not. Again, I agree with your general point that Chomsky doesn’t really seem to be particularly interested in the relevant literature here, and is quite dismissive of some of it. But I don’t think either of these points addresses the specific learnability problem identified by Gold, again because his theorem doesn’t make any assumptions about the representations underpinning languages – it just requires the Russian doll configuration I mentioned above.

    That’s not to say that Gold’s Theorem is particularly relevant. As explained in the excellent Johnson (2004), in constructing his proof, Gold defined some very strict conditions on learnability which don’t actually seem that relevant for real language learning. Gold assumes that learners may be required to learn any language from a class of languages (when in fact they may be unlikely to encounter some languages, e.g. as a result of cultural evolutionary processes disfavouring less learnable languages, a point which I think is made in Zuidema, 2003), and Gold assumes that learning has to be possible in the face of any possible data set generated from the target language (including pathologically unhelpful data sets: his proof says nothing about learnability from data that is more representative, or even data that is designed to be helpful, as might be the case for child-directed speech). Basically I think your point still stands – the learnability claims are probably over-emphasised in some sections of the literature. But I also think it’s important to point out that, at least as I understand it, (1) there is no mathematical result proving that language is unlearnable (in a sense that’s meaningful to developmentalists, rather than the sense Gold uses in his proof), and (2) recent advances e.g. in statistical learning research don’t in any way ‘disprove’ Gold’s original result.

    Gold, E. M. (1967). Language Identification in the Limit. Information and Control, 10, 447–474.
    Johnson, K. (2004). Gold’s Theorem and Cognitive Science. Philosophy of Science, 71, 571–592.
    Zuidema, W. (2003). How the poverty of the stimulus solves the poverty of the stimulus. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems 15 (Proceedings of NIPS’02), Cambridge, MA. MIT Press.

  13. Now, can we finally get back to Skinner´s Verbal Behavior? Thank you.

  14. I've thought about it and decided that children learn cliches, and then how to disassemble and reassemble them. The latter is syntax, and the former is responsible for bad writing.

  15. This is rather too easy a target, no ? You comment on someone's work from the perspective of someone who has seen 40 years of research stimulated by Chomsky's ideas. It's a little unfortunate to dismiss many of his theories as not borne out by subsequent experiment etc. You'll be happy to see your own work evaluated this way in 2040 - if it has generated the extraordinary variety of work, research, interest, development as Chomsky's did ?

  16. Hi Dorothy,
    Like you I find Chomsky’s work very difficult to read, and unlike you I’ve not attempted to work my way though “The Science of Language”. But in his paper “Three factors in language design” (2005, p.6, available here,, Chomsky writes:
    “Assuming that the faculty of language has the general properties of other biological systems,
    we should, therefore, be seeking three factors that enter into the growth of language in the individual:
    1. Genetic endowment, apparently nearly uniform for the species, which interprets part of
    the environment as linguistic experience, a nontrivial task that the infant carries out
    reflexively, and which determines the general course of the development of the language
    2. Experience, which leads to variation, within a fairly narrow range, as in the case of other
    subsystems of the human capacity and the organism generally.
    3. Principles not specific to the faculty of language.”
    I am far from being well versed in Chomsky’s work, but I interpret this paragraph as not being incompatible with there being a role for statistical learning. And as I understand it, he did consider the role of statistical learning one of his earliest works, Logical Structure of Linguistic Theory (1955), but he considered that it did not have full explanatory power. As Franck Ramus commented a few days ago on this blog, the explanatory power of statistical learning is still up for debate.
    Jeff Lidz from the University of Maryland, who works within a Chomskyan framework but who writes much more lucidly and actually carries out experimental work with young children, argues that both domain-specific and domain-general mechanisms are needed to explain language acquisition. Lidz gave a very clear set of lectures at UCL in June this year, and the theme from his lectures was that statistical learning is licensed by Universal Grammar. I understood him to be making the point that Universal Grammar and statistics are not in opposition. Rather, they work together: Learners are able to use statistics because they already know what statistics they need to compute. In other words, Universal Grammar limits the hypothesis space. Lidz’s webpage has several downloadable papers presenting this argument,

  17. Just a quick message to all who have commented to say a big thank you. I am intrigued and fascinated by the diversity of views that have been expressed. I do plan to respond when I have had a chance to read and think a bit more, but this may take a week or two!

  18. I'd just like to second Chloe's comment about Lidz's work on child language acquisition, which provides a range of empirical studies arguing for an approach to language learning that incorporates both statistical biases and representational (UG) constraints. The work on how child speakers of Kannada interpret ditransitives is especially interesting. I must admit I think that many theoretical syntacticians (and I am a theoretical syntactician so mea culpa) have been very bad at making the results of our work accessible to people in the psychology of language learning, but I do think that many of those results are potentially really interesting to the broader psychological community: structural conditions on meaning construction (e.g. cases where you might expect to get a particular meaning but it just isn't there); structural conditions on resolving meaning dependencies (between referential antecedents like names and pronouns, between quantifiers like `every boy' and pronouns, between question phrases like `which boy' and the verb which assigns them their semantic role (as, say, agent or patient), ...); the relations between linear orders of types of phrases and their meanings across different languages; I could go on interminably. Almost all of this is fairly well agreed upon at least at the phenomenological level (if not in which theory of syntax can best account for them) and they are real properties of people (or people's behaviour in experimental settings anyway - google Jon Sprouse's recent work on how reliable grammaticality judgments are). These descriptive regularities across linguistic behaviour are what make language so interesting and challenging to build theories of, and they are the things that 95% of generative syntacticians spend their time on. There are real live and interesting debates in the linguistic literature about whether what are called `island effects' (where certain syntactic structures disrupt the relation between, say, a question phrase and the verb which assigns it its meaning role) are to be explained by how the syntactic processing mechanism deals with certain grammatical structures, or whether these effects, which are very robust, are as a result of the way that the grammar is set up. Again, I could go on with many more cases. The basic point is that we theoretical syntacticians shouldn't dismiss the possibility that there are important effects of frequency in learning languages and hence in the final grammars, or that there are domain general biases (in fact, I think Chomsky himself has said that both are relevant) but equally, I think that one can't dismiss the challenges for psychological explanation raised by the enormous amount of solid theoretical and experimental results that have emerged from generative syntax over the years, and which are still appearing. I do acknowledge, however, the we theoretical syntacticians really need to do a better job of making our work more accessible.

  19. Hi Dorothy,
    Following on from David’s post and the sorts of grammatical constructions that generativists spend their time on, another researcher whose work is worth reading is Stephen Crain. In a paper with Takuya Goro and Rosalind Thornton, available here,, he presents examples of children’s syntactic errors that really are troubling for statistical learning accounts. For example, English children sometimes insert extra wh-words in long-distance questions, saying ungrammatical things such as (1) *“What do you think what pigs eat?” and (2) * “Who did he say who is in the box?”, even though they do not get exposed to medial-wh constructions in the input. Furthermore, Crain et al argue that these questions could not be produced by merging the template for two simple questions: such a strategy could work for (2), but not for (1), as *”what pigs eat” is not a well-formed question. In a neat analysis, they show that such constructions are grammatical in some dialects of German (where, for example, the equivalent of “Who do you think who goes home?” is perfectly OK). Even neater, they argue that there are no examples of English children using medial wh-words in, for example, sentences such as *“Who do you want who to win?”, where this would require extraction from an infinitival clause – and those same dialects of German don’t allow their speakers to do that either. Importantly, children do hear fragments like “who + to + verb” in, for example, embedded questions, e.g. “I know who to follow”. So they could potentially form those sorts of questions using the ‘cut-and-paste’ operations that Tomasello proposes. But they don’t. In other words, children sometimes deviate in certain ways from the grammar of the particular language that they’re exposed to – but they don’t deviate from what’s allowed in other natural human languages. How are their errors constrained? Crain et al. argue for a Universal Grammar and temporarily mis-set parameters. I haven’t seen any other convincing explanation of their data.

    Crain, S., Goro, T. & Thornton, R. (2006). Language acquisition is language change. Journal of Psycholinguistic Research, 35, 31-49.

  20. As much as I enjoyed the article, one particular line really jumped out for me:

    "And there are still big questions, identified by Chomsky and others – Why don’t other species have syntax?"

    Bird song has syntactical order that is important to its perception from intraspecifics. I wanted to know whether that was either purposefully not counted or just ignored?

  21. I enjoyed reading your review. You may be interested in a review of the same work I recently posted at It supports some of the points you make and gives some additional perspective about the sad state of affairs Chomsky's linguistics have become. In case you have questions/comments my contact info can be found at that link...

  22. Stephen Crain and Max Coltheart2 October 2012 02:46

    In the Arts and Humanities Citation Index (1980-1992), Noam Chomsky was the most cited living person, and the eighth most cited source overall. The other nine were: Marx, Lenin, Shakespeare, Aristotle, the Bible, Plato, Freud and Cicero. Presumably, everyone agrees that these other nine have made significant contributions to the history of ideas. But Dorothy Bishop questions whether Noam Chomsky deserves the place he has been accorded.
    Chomsky’s failing, according to Bishop, is to theorize about adult languages without regard to the process of language acquisition. Bishop’s view – for which she offers no supporting evidence or argument, only an assertion – is that one cannot understand adult language without understanding the process of language acquisition. Chomsky denies this. He observes that children raised in extremely different linguistic environments are nevertheless able to communicate effortlessly; children born in New Zealand, in the UK, in the Australian bush, or on the streets of New York, all acquire equivalent grammars (aside from differences in vocabulary and pronunciation). Chomsky’s conclusion is that the process of language acquisition leaves no stamp on adult language. This contradicts Bishop’s assertion that investigations into the process of language acquisition are critical to the development of theories of adult languages.
    To support her claim that adult language is influenced by the process of language acquisition, Bishop cites a study by Dabrowska (2010) which found that linguists and nonlinguists differed in their judgments about the acceptability of complex sentences; linguists tended to be less influenced by the lexical content of the test sentences than nonlinguists were. Although the two groups were given different instructions, Dabrowkska reports that “linguists and nonlinguistic alike gave higher ratings to sentences that linguists would describe as ‘grammatical’ ”. On the basis of this study, Bishop reaches the opposite conclusion, namely that “agreement on well-formedness of complex sentences [is] far from universal in adults.”

    The finding that linguists and nonlinguists invoke different criteria in judging whether or not sentences are acceptable has in any case little or no bearing on the relationship between language acquisition and adult languages, since acceptabillity judgements do not directly tap an adult’s intrinsic grammatical knowledge. But let us suppose, for the sake of argument, that Chomsky is wrong to assert that the process of language acquisition fails to leave a stamp on adult languages. Does this suffice to diminish Chomsky’s contributions to our intellectual history? Only if we are influenced by people like Bishop who, by their own admission, fail to understand the contributions Chomsky has made, but which so many others have recognized.

  23. Worth noting that citations are time-limited and that frequency of citations doesn't tell us anything about the validity or reliability of the ideas involved. All ten of the sources listed have been widely critiqued - in many cases with some justification.

  24. My curiosity having been piqued by the opening paragraph of Stephen Crain & Max Coltheart's comment I was prompted to read the paper by Crain et al cited by Chloe Marshall. I wasn't convinced that the evidence presented could be used, as the authors suggest,

    'to adjudicate between a UG-based approach to child language and an experience-based approach.'

    Three reasons:

    1. The authors use 'linguistic error' in a very narrow sense - along the lines of a consistent grammatical error within a single sentence, rather than, say, the single clauses, or sentences that are unfinished, grammatically incorrect or start off with one structure and switch to another - familiar to anyone who has tried to do a verbatim transcript of unscripted speech.

    2. The authors appear to assume that an experienced-based model of language acquisition limits children's exploration of the 'space of human languages' to speech patterns they hear around them (e.g. English doesn't have a medial 'wh-' so where does it come from in children's speech?). Of course children do mimic speech patterns but because of the way human memory works, they don't do it entirely accurately. They approximate, fill in, omit, innovate - in exactly the same way as adults handle incomplete information.

    3. The authors appear to assume that linguistic structures are clear-cut. It's true that the prototypical features of languages are usually amenable to classification, but the way people use a language isn't prototypical, it's an approximation to prototypicality.

    Language is primarily a vehicle for communicating information. As long as an utterance carries sufficient information to accurately convey its intended meaning, there is no compelling reason for the speaker to make it conform closely to the prototypical features of the language being spoken. If however the meaning isn't clear to listeners, speakers generally get frustrated and if possible try something else.

    Toddlers constructing a phrase in a way that a parent doesn't understand tend to get very cross at the uncomprehending response, but if the penny drops and the parent corrects the error, the child will often pay close attention to the correction and abandon the incorrect form. This means that if incorrect grammatical forms do the job, they are likely to persist for a while in children's speech, but forms that are ambiguous or don't convey sufficient information will be abandoned. One would expect to see the same phenomenon in languages. It's quite likely that persistent grammatical errors in one language will map on to accepted forms in another , because both are emergent properties of similar constraints and affordances.

    The Universal Grammar hypothesis appears to be based on two implicit assumptions. Firstly that language is a special cognitive domain not subject to the errors and biases that occur in other cognitive processes; and secondly that recurring patterns must be the result of some underlying design (in this case biological) and that they couldn't have arisen as emergent properties of interacting systems.

  25. Chomsky's late wife Carol wrote a book published in 1969, entitled "The Acquisition of Syntax in Children From 5 to 10". Might be worth a look if you're interested in his views on acquisition, as I imagine he had an influence on it.

  26. I realise those of you interested in this topic must have given up hope that I will reply to comments as promised. Just to say I will! It is just taking longer than anticipated to find time to read all I want to read. Definitely something by end of October and I hope sooner.
    Alex: I know about that book you mentioned - read it as a grad student! My recollection is that it focused primarily on a few constructions such as the distinction between "John is easy to please" vs "John is eager to please" but it's worth taking another look to refresh my memory.

  27. Geoffrey J. Huck26 November 2012 15:56

    NC is a brilliant scholar, a towering figure in linguistics as well as an extraordinary polemicist and an inspiring and charismatic leader and teacher. But one fact about NC that seems to me indisputable is that he is constitutionally incapable of playing any role other than king of the mountain, even as he likes to pose as a sort of impotent outsider. According to him, he is always right and always has been right; and so far as I know, he has never been able to find it within himself to admit to error. I see no point in debating against him, because it appears that, from his perspective, any argument you wish to make by positioning yourself in unfriendly disagreement with him must be the result of a pernicious misunderstanding on your part – i.e., your position is either a notational variant of his or is just plain wrong, and your failure to see or to acquiesce to this is evidence of your obtuseness or venality.
    One way that NC succeeds better than the rest of this at this not uncommon strategy is by making beautifully sweeping claims, supported by extremely impressive and detailed argument, that are in the end never very easy to pin down. For example, he never supplied his theories of the autonomy of syntax or the poverty of the stimulus with hypotheses that were actually empirically testable. Though his arguments are replete with linguistic examples, they prove (if anything) not the sweeping claims but lemmas that may or may not have anything to do with those claims. The lemmas may be interesting and challenging, but they are not conclusive about much, even including internal mechanisms of the theory (which is why NC can adjust those mechanisms so easily from year to year). The exceptions to this general rule are for the most part to be found in his work from the ‘50s - his MA and PhD theses, his 1953 Language paper, and his contributions to mathematical linguistics. Subsequently, a great deal of the empirically testable work that has informed his theories has been done by students and followers, and that work can be and has been accepted or disowned by NC from time to time as he sees fit.
    Admittedly, NC is the object of constant virulent attack just because he is king of the mountain. But he did not get to where he is by being a nice guy (which has no bearing on theory or empirics, but does have a bearing on how arguments are deployed and received and argued against). NC has usually set the tone of the conversation, and it has never been a genial one.
    My point is that there’s a theoretical side, a history-/philosophy-/sociology-of-science side, and an empirical side to NC’s work, and the trick, if you want to dispute with him, is to separate these and then find (if you can) the specifically empirical claims. The theoretical component (because it is, for NC, thoroughly invested with ideology and at the same time unattached to any falsifiable hypotheses) is not subject to refutation.
    Although I have focused on NC here because he was the subject of the original post, the problem is obviously much more general. What to do about sweeping claims (such as whether correction or some correlate thereof plays a role in first language acquisition, or whether adults without pathology can usefully be said to acquire “equivalent grammars,” independent of or unaffected by individual variation or levels of attainment, that we linguists can use our intuition to see into) generally believed by most linguists, whether Chomskyan or not? I don’t know – we just try to do our bit for Kuhnian normal science except when intertheoretic rumbles are afoot and the larger problems are highlighted. I readily confess that I am no less guilty here of making sweeping claims unsupported by argument (though supporting arguments by me and others can be found elsewhere).

  28. The core activity of science is to produce models of reality, and validate them by observation. It's been 50-some years: Where is the TG model of language, that can understand, produce, or learn realistic human language? I should say, where can I download it, because to not embody such a model in a computer program would be pathetically anachronistic. 50 YEARS people, and there's really nothing. Chomsky is a mathematician, not a scientist. If his math provides useful models, great, put it to work. If it doesn't it's just pretty math. Personally I think Chomsky's linguistic work will end up like the Ptolemaic system - Beautiful, rational, based on fundamentally flawed assumptions. Except that epicycles actually produced quite accurate models...