Saturday 27 October 2012

Auditory processing disorder (APD): Schisms and skirmishes

Photo credit: Ben Earwicker, Garrison Photography, Boise, ID
A remarkable schism is developing between audiologists in the UK and the USA on the topic of auditory processing disorder (APD) in children. In 2010, the American Academy of Audiology published clinical practice guidelines for auditory processing disorder.  In 2011, the British Society of Audiology published a position statement on the same topic, which came to rather different conclusions. This month a White Paper by the British Society of Audiology appeared reaffirming their position alongside invited commentaries.
So what is all the fuss about? The argument centres on how to diagnose APD in children. Most of the tests used in the USA to identify APD involve responding to speech. One of the most widely-used assessments is the SCAN-C battery which has four subtests:
  • Filtered words: Repeat words that have been low-pass filtered, so they sound muffled
  • Auditory figure-ground: Repeat words that are presented against background noise (multi-talker babble)
  • Competing words: Repeat words that are presented simultaneously, one to each ear (dichotically)
  • Competing sentences: Repeat sentences presented to one ear while ignoring those presented simultaneously to the other ear
In 2006, David Moore, Director of the Medical Research Council’s Institute of Hearing Research in Nottingham, created a stir when he published a paper arguing that APD diagnosis should be based on performance on non-linguistic tests of auditory perception. Moore’s concern was that tests such as SCAN-C, which use speech stimuli, can’t distinguish an auditory problem from a language problem. I made similar arguments in a blog post written last year. Consider the task of doing a speech perception test in a foreign language: if you don’t know the language very well, then you may fail the test because you are poor at distinguishing unfamiliar speech sounds or recognising specific words. This wouldn’t mean you had an auditory disorder.
A recent paper by Loo et al (2012) provided concrete evidence for this concern. They compared multilingual and monolingual children on performance on an APD battery. All children were schooled in English, but a high proportion spoke another language at home.  The child’s language background did not affect performance on non-linguistic APD tests, but had a significant effect on most of the speech-based tests.
Results from the study were reported in 2010 and presented a challenge for the concept of APD.  Specifically, Moore et al concluded that, when effect of task demands had been subtracted out,  non-linguistic measures of auditory processing “bore little relationship to measures of speech perception or to cognitive, communication, and listening skills that are considered the hallmarks of APD in children. This finding provides little support for the hypothesis that APD involves impaired processing of basic sounds by the brain, as currently embodied in definitions of APD.”
Overall, Moore et al found that if we use auditory measures that are carefully controlled to minimise effects of task demands and language ability, we find that they don’t identify children about whom there is clinical concern.  Nevertheless, children exist for whom there is a clinical concern, insofar as the child reports difficulty in perceiving speech in noise. So how on earth are we to proceed?
In the White Paper, the BSA special interest group suggest that the focus should be on developing standardized methods for identifying clinical characteristics of APD, particularly through the use of parental questionnaires.
The experts who responded to Moore and colleagues took a very different line.  The specific points they raised varied, but they were not happy with the idea of reliance on parental report as the basis for APD diagnosis.  In general, they argued for more refined measures of auditory function. Jerger and Martin (USA) expressed substantial agreement with Moore et al about the nature of the problem confronting the APD concept. “There can be no doubt that attention, memory, and language disorder are the elephants in the room. One can view them either as confounds in traditional behavioral tests of an assumed sensory disorder or, indeed, as key factors underlying the very nature of a ‘more general neurodevelopmental delay’” . They rejected, however, the idea of questionnaires for diagnosis, and suggested that methods such as electroencephalography and brain imaging could be used to give more reliable and valid measures of APD.
Dillon and Cameron (Australia) queried the usefulness of a general term such as APD, when the reality was that there may be many different types of auditory difficulty, each requiring its own specific test. They described their own work on ‘spatial listening disorder’, arguing that this did relate to clinical presentation.
The most critical of Moore et al’s arguments were Bellis and colleagues (USA). They implied that a good clinician can get around the confound between language and auditory assessments: “Additional controls in cases in which the possible presence of a linguistic or memory confound exists may include assessing performance in the non-manipulated condition (e.g. monaural versus dichotic, nonfiltered versus filtered, etc.) to ensure that performance deficits seen on CAPD tests are due to the acoustic manipulations rather than to lack of familiarity with the language and/or significantly reduced memory skills.” Furthermore, according to Bellis et al, the fact that speech tasks don’t correlate with non-speech tasks is all the more reason for using speech tasks in an assessment, because “in some cases central auditory processing deficits may only be revealed using speech tasks”. 
Moore et al were not swayed by these arguments. They argued first, that neurobiological measures, such as electroencephalography, are no easier to interpret than behavioural measures. I’d agree that it would be a mistake to assume such measures are immune from top-down influences (cf. Bishop et al, 2012) and reliability of measurement can be a serious problem (Bishop & Hardiman,2010). Moore et al were also critical of the idea that language factors can be controlled for by within-task manipulations when speech tasks are used. This is because the use of top-down information (e.g. using knowledge of vocabulary to guess what a word is) becomes more important as a task gets harder, so a child whose poor language has little impact on performance in an easy condition (e.g. listening in quiet) may be much more affected when conditions get hard (e.g. listening in noise). In addition, I would argue that the account by Bellis et al implies that they know just how much allowance to make for a child’s language level when giving a clinical interpretation of test findings. That is a dangerous assumption in the absence of hard evidence from empirical studies.
So are we stuck with the idea of diagnosing APD from parental questionnaires? Moore et al argue this is preferable to other methods because it would at least reflect the child’s symptoms, in a way that auditory tests don’t. I share the reservations of the commentators about this, but for different reasons. To my mind this approach would be justified only if we also changed the label that was used to refer to these children.  The research to date suggests that children who report listening difficulties typically have deficits in language, literacy, attention and/or social cognition (Dawes & Bishop, 2010; Ferguson et al, 2011). There’s not much evidence that these problems are usually caused by low-level auditory disorder. It is therefore misleading to diagnose children with APD on the basis of parental report alone, as this label implies a primary auditory deficit.
In my view, we should reserve APD as a term for  low-level auditory perceptual problems in children with normal hearing, which are not secondary consequences of language or attentional deficits. The problem is that we can’t make this diagnosis without more information about the ways in which top-down influences impact on auditory measures, be they behavioural or neurobiological. The population study by Moore et al (2010) made a start on assessing how far non-linguistic auditory deficits related (or failed to relate) to cognitive deficits and clinical symptoms in the general population. The study by Loo et al (2012) adopts a novel approach to understanding how language limitations can affect auditory test results, when those limitations are due to the child’s language background, rather than any inherent language disorder. The onus is now on those who advocate diagnosing APD on the basis of existing tests to demonstrate that they are not only reliable but also valid according to these kinds of criteria. Until they do so, the diagnosis of APD will remain questionable.

P.S. 12th November 2012
Brief video by me on "Auditory processing disorder and language impairment" available here: (with links to supporting slideshow and references)
Loo, J., Bamiou, D., & Rosen, S. (2012). The Impacts of Language Background and Language-Related Disorders in Auditory Processing Assessment Journal of Speech, Language, and Hearing Research DOI: 10.1044/1092-4388(2012/11-0068)
Moore, D., Rosen, S., Bamiou, D., Campbell, N., & Sirimanna, T. (2012). Evolving concepts of developmental auditory processing disorder (APD): A British Society of Audiology APD Special Interest Group ‘white paper’ International Journal of Audiology, 1-11 DOI: 10.3109/14992027.2012.723143

See also previous blogpost: "When commercial and clinical interests collide" (6th March 2011)

Monday 1 October 2012

Data from the phonics screen: a worryingly abnormal distribution

The new phonics screening test for children has been highly controversial.  I’ve been surprised at the amount of hostility engendered by the idea of testing children’s knowledge of how letters and sounds go together. There’s plenty of evidence that this is a foundational skill for reading, and poor ability to do phonics is a good predictor of later reading problems. So while I can see there are aspects of the implementation of the phonics screen that could be improved,  I don’t buy arguments that it will ‘confuse’ children, or prevent them reading for meaning.

I discovered today that some early data on the phonics screen had recently been published by the Department for Education, and my inner nerd was immediately stimulated to visit the website and download the tables.  What I found was both surprising and disturbing.

Most of the results are presented in terms of proportions of children ‘passing’ the screen, i.e. scoring 32 or more. There are tables showing how this proportion varies with gender, ethnic background, language background, and provision of free school meals. But I was more interested in raw scores: after all, a cutoff of 32 is pretty arbitrary. I wanted to see the range and distribution of scores.  I found just one table showing the relevant data, subdivided by gender, and I have plotted the results here.
Data from Table 4, Additional Tables 2, SFR21/2012
Department for Education (weblink above)

Those of you who are also statistics nerds will immediately see something very odd, but other readers may need a bit more explanation.  When you have a test like the phonics test, where each item is scored right or wrong, and the number of correct items is totalled up, you’d normally expect to get a continuous distribution of scores. That is to say, the numbers of children obtaining a given score should increase gradually up to some point corresponding to the most typical score (the mode), and then gradually decline again. If the test is pretty easy, you may get a ceiling effect, i.e. the mode may be at or close to the maximum score, so you will see a peak at the right hand side of the plot, with a long straggly tail of lower scores.  There may also be a ‘bump’ at the left hand edge of the distribution, corresponding to those children who can’t read at all – a so-called ‘floor’ effect.  That's evident in the scores for boys. But there's also something else. There’s a sudden upswing in the distribution, just at the ‘pass’ mark. Okay, you might think, that’s because the clever people at the DfE have devised the phonics test that way, so that 31 of the items are really easy, and most children can read them, but then they suddenly get much harder.  Well, that seems unlikely, and it would be a rather odd way to develop a test, but it’s not impossible. The really unbelievable bit is the distribution of scores just above and below the cutoff. What you can see is that for both boys and girls, fewer children score 31 than 30, in contrast to the general upward trend that was seen for lower scores. Then there’s a sudden leap , so that about five times as many children score 32 than 31. But then there’s another dip: fewer children score 33 than 32. Overall, there’s a kind of ‘scalloped’ pattern to the distribution of scores above 32, which is exactly the kind of distribution you’d expect if a score of 32 was giving a kind of ‘floor effect’.  But, of course, 32 is not the test floor.

This is so striking, and so abnormal, that I fear it provides clear-cut evidence that the data have been manipulated, so that children whose scores would put them just one or two points below the magic cutoff of 32 have been given the benefit of the doubt, and had their scores nudged up above cutoff.

This is most unlikely to indicate a problem inherent in the test itself. It looks like human bias that arises when people know there is a cutoff and, for whatever reason, are reluctant to have children score below that cutoff.  As one who is basically in favour of phonics testing, I’m sorry to put another cat among the educational pigeons, but on the basis of this evidence, I do query whether these data can be trusted.