Monday, 4 May 2015

Great Expectations: Our early assessments of schoolchildren are misleading and damaging

The Early Years Foundation Stage Profile was developed by the government's Standards and Testing Agency "to support practitioners in making accurate judgements about each child's attainment". More specifically:
The EYFS Profile summarises and describes children’s attainment at the end of the EYFS. It is based on ongoing observation and assessment in the three prime and four specific areas of learning, and the three characteristics of effective learning,
• Prime areas: communication and language; physical development; personal, social and emotional development
• Specific areas:  literacy; mathematics; understanding the world; expressive arts; and design of effective learning
• Characteristics: playing and exploring;  active learning;  creating and thinking critically
for each ELG, practitioners must judge whether a child is meeting the level of development expected at the end of the Reception year (expected), exceeding this level (exceeding), or not yet reaching this level (emerging).
The manual gives concrete examples of the kinds of behaviour that meet the expected level for a given Early Learning Goal. For instance:
Understanding: Children follow instructions involving several ideas or actions. They answer ‘how’ and ‘why’ questions about their experiences and in response to stories or events.
Speaking: Children express themselves effectively, showing awareness of listeners’ needs. They use past, present and future forms accurately when talking about events that have happened or are to happen in the future. They develop their own narratives and explanations by connecting ideas or events.
Strikingly absent from these descriptions is any allowance for the child's age. The timing of the assessment is specified to occur when children are aged from 4 yrs 10 months to 5 yr 9 months.
Children's language skills (and indeed other skills) develop rapidly in the preschool and early school years.  I first became aware of this many years ago when I was developing a children's comprehension assessment (TROG). The goal was to establish the typical range of performance at different ages and subsequently use TROG to identify cases of poor comprehension in clinical settings. The assessment involved showing children sets of four pictures and asking them to point to the one that matched a spoken phrase or sentence.  I knew very little about developmental psychology at the time, so I just decided to try the materials with children of different ages to see how they reacted. It soon became apparent that there were substantial age-related changes, and I realised that if I would need to use four age-bands for 4-year-olds and two age-bands for 5-year-olds. Some illustrative data are shown in Figure 1.

Figure 1: Percentage children getting 4/4 items correct on blocks testing specific constructions. 
From the original Test for Reception of Grammar (1983).

Findings like this are not specific to this test. I've developed several language assessments over the years and I've used those developed by others: they all show rapid change from 4 to 6 years.
Concerned by this, I wrote for information to the government's Children and Early Years Data Unit, who referred me to this report.  This gives percentages of children reaching a Good Level of Development, defined as achieving "at least the expected level in the early learning goals in the prime areas of learning (personal, social and emotional development; physical development; and communication and language) and in the specific areas of mathematics and literacy." A Good Level of Development was obtained by 69% of autumn-born children, 59% of spring-born children and 47% of summer-born children, confirming that the standards used to evaluate children are sensitive to age.
This is seriously problematic for at least reasons. First, it means we are using flawed assessments that will over-identify problems in younger children. It is already established that in the USA attentional deficits are over-diagnosed in summer-born children (Elder, 2010) – a problem that has long-term consequences when children are subsequently prescribed medication for what may actually normal behaviour in an immature child. Making children feel that they are falling short of an expected standard before they are 5 years old cannot be good for their development. In this regard it is noteworthy that there is evidence that being summer-born continues to be associated with educational disadvantage in English children through the later school years (Crawford et al, 2013).
A second problem is that use of inappropriate criteria for 'expected' levels of development will give a false impression of the numbers of children with developmental difficulties. Consider this article describing an 'early learning crisis' with '20 percent of children unable to communicate properly at age 5'. I have a particular interest in children who have language difficulties, but nobody is helped by over-identifying problems in children who are just the youngest in their class. I've seen enough 4 and 5-year-olds to know that the 'early learning goals' for understanding and speaking are not realistic 'expectations' for 4-year-olds and for those who have only just turned 5 years. Indeed, the fact that one third of the oldest children are not regarded as having a good level of development suggests to me that the expectations are inappropriately high even for the oldest 5-year-olds.
My colleague Courtenay Norbury, Professor in the Psychology Dept at Royal Holloway, will shortly be publishing data from a large survey of language development in reception class children in Surrey*. She tells me that month of birth is once again emerging as an important factor.
I'm not someone who is opposed to assessment in principle, but if you are going to do it, it's important to do it in an informed manner. Surely it is time for the policy-makers in this area to recognise that their current practices of early assessment are misleading, and have the potential to cause damage when children are evaluated against standards that are overly stringent and do not take age into account.

*Update 5th June 2015: This is now published as an open access 'early view' paper in Journal of Child Psychology and Psychiatry:


  1. I understand what you mean and I agree, but there is a risk that you will misunderstood.
    A test is not flawed just because it shows an important age (or month of birth) effect. Indeed, there will always be such an effect, in a system where there is one grade per year, so children in the same class have (at least) a 12-month age range.
    The main conclusion that should be drawn from this observation is that test scores can only be interpreted relatively to very precise norms, that provide centiles trimester by trimester, if not month by month (and, ideally, taking into account both age and grade). This should be feasible if the EYFS has been administered nationwide. And of course, standards of achievement should be defined realistically, based on these very norms, rather than a priori.

  2. Makes sense that scores are age-dependent. But essential then to use age-linked norms for decision making. And in research to include age-in-days as a covariate, probably along a growth curve (age^2 age^3), and nest children in class to account for young child in advanced class, versus older child in a less-advanced classroom.

  3. Dear Franck and Tim
    Thanks for clarification.
    As you note, yes, of course, we'd expect tests to reflect age changes. Indeed, if they didn't we'd worry whether they were sensitive and valid measures. The point I was making was that those devising EYFS don't seem to have realised that!
    They therefore have a measure that is confounded with age.

  4. The "misleading and damaging" consequences related to "age" is only part of the "expectations" etiology. Not only is the "measurement" information faulty, it mis-directs attention from the instruction/schooling to accomplish the "expectation"--if any formal effort is at all necessary. "Norming" and other statistical adjustments skate over this fatal flaw in the logic, rather than removing it.

    follow instructions involving several ideas or themselves effectively, showing awareness of listeners’ needs

    On the one hand, all children have been doing these things from birth. On the other hand, every adult is "emerging" in the expectations.

    The antidote is simple in principle: Specify an observable instructional consequence of interest. Specify minimal prerequisites for instruction to accomplish the intent. Devise products and protocols for reliably attaining the expectation. That's standard science/technology practice, but it has yet to be applied to schooling.