It would be rash to assume that all the problems of language acquisition can be solved by adopting a statistical learning approach. And there are still big questions, identified by Chomsky and others – Why don’t other species have syntax? How did language evolve? Is linguistic ability distinct from general intelligence? But we now have a theoretical perspective that makes sense in terms of what we know about cognitive development and neuropsychology, that has general applicability to many different aspects of language acquisition, which forges links between language acquisition and other types of learning, and leads to testable predictions. The beauty of this approach is that it is amenable both to experimental test and to simulations of learning, so we can identify the kinds of cues children rely on, and the categories that they learn to operate with.
So how does Chomsky respond to this body of work? To find out, I decided to take a look at The Science of Language, which based on transcripts of conversations between Chomsky and James McGilvray between 2004 and 2009. It was encouraging to see from the preface that the book is intended for a general audience and “Professor Chomsky’s contributions to the interview can be understood by all”.
A stylistic device commonly used by Chomsky is to set up a dichotomy between his position and an alternative, then represent the alternative in a way that makes it preposterous. For instance, his rationalist perspective on language acquisition, which presupposes innate grammar, is contrasted with an empiricist position in which “Language tends to be seen as a human invention, an institution to which the young are inducted by subjecting them to training procedures”. Since we all know that children learn language without explicit instruction, this parody of the empiricist position has to be wrong.
Correction: 4/9/2010. I had originally cited the wrong reference to Dabrowska (Dabrowska, E. 1997. The LAD goes to school : a cautionary tale for nativists. Linguistics, 35, 735-766). The 1997 paper is concerned with variation in adults' ability to interpret syntactically complex sentences. The 2010 paper cited above focuses on grammaticality judgements.
- The original work by Saffran et al (1996) focused on demonstrating that infants were sensitive to transitional probabilities in syllable strings. It was suggested that this could be a mechanism that was involved in segmenting words from speech input.
- Redington et al (1998) proposed that information about lexical categories could be extracted from language input by considering sequential co-occurrences of words.
- Edelman and Waterfall (2007) reviewed evidence that children attend to specific patterns of specific lexical items in their linguistic input, concluding that they first acquire the syntactic patterns of particular words and structures and later generalize information to entire word classes. They went on to describe heuristic methods for uncovering structure in input, using the example of the ADIOS (Automatic DIstillation Of Structure) algorithm. This uses distributional regularities in raw, unannotated corpus data to identify significant co-occurrences, which are used as the basis for distributional classes. Ultimately, ADIOS discovers recursive rule-like patterns that support generalization.
“In Logical Structure of Linguistic Theory (LSLT; p. 165), I adopted Zellig Harris’s (1955) proposal, in a different framework, for identifying morphemes in terms of transitional probabilities, though morphemes do not have the required beads-on-a-string property. The basic problem, as noted in LSLT, is to show that such statistical methods of chunking can work with a realistic corpus. That hope turns out to be illusory, as has recently been shown by Thomas Gambell and Charles Yang (2003), who go on to point out that the methods do, however, give reasonable results if applied to material that is preanalyzed in terms of the apparently language-specific principle that each word has a single primary stress. If so, then the early steps of compiling linguistic experience might be accounted for in terms of general principles of data analysis applied to representations preanalyzed in terms of principles specific to the language faculty....”
“Speaking broadly, this research generally finds that children’s representations do not differ in kind from those of adults and that in cases where children behave differently from adults, it is rarely because they have the wrong representations. Instead, differences between children and adults are often attributed to task demands (Crain & Thornton, 1998), computational limitations (Bloom,1990; Grodzinsky & Reinhart, 1993), and the problems of pragmatic integration (Thornton & Wexler, 1999) but only rarely to representational differences between children and adults (Radford, 1995; see also Goodluck, this volume).” Lidz, 2008
“The apparent complexity of language and its uniqueness vis a vis other aspects of cognition, which are taken as major discoveries of the standard approach, may derive in part from the fact that these ‘performance’ factors are not available to enter into explanations of linguistic structure. Partitioning language into competence and performance and then treating the latter as a separate issue for psycholinguists to figure out has the effect of excluding many aspects of language structure and use from the data on which the competence theory is developed.” (p 572)
“If you’re trying to get Universal Grammar to be articulated and restricted enough so that an evaluation will only have to look at a few examples, given data, because that’s all that’s permitted, then it’s going to be very specific to language, and there aren’t going to be general principles at work. It really wasn’t until the principles and parameters conception came along that you could really see a way in which this could be divorced. If there’s anything that’s right about that, then the format for grammar is completely divorced from acquisition; acquisition will only be a matter of parameter setting. That leaves lots of questions open about what the parameters are; but it means that whatever is left are the properties of language.”
“an operation that enables you to take mental objects [or concepts of some sort], already constructed, and make bigger mental objects out of them. That’s Merge. As soon as you have that, you have an infinite variety of hierarchically structured expressions [and thoughts] available to you.”
“Developing suggestions of Thomas Roeper’s, Yang proposes that UG provides the neonate with the full array of possible languages, with all parameters valued, and that incoming experience shifts the probability distribution over languages in accord with a learning function that could be quite general. At every stage, all languages are in principle accessible, but only for a few are probabilities high enough so that they can actually be used.” (Chomsky, 2005, p. 9).
P.S. 15th October 2012
I have added some links to the response of 12th October. In addition, I have discovered this book, which gives an excellent account of generative vs. constructivist approaches to language acquisition:
Ambridge, B., & Lieven, E. V. M. (2011). Child Language Acquisition - Contrasting Theoretical Approaches: Cambridge University Press.