Thursday 23 December 2010

A day working from home

8.00 a.m. Resolve that today is the day I will seriously engage with referee comments on a paper. Best to go in to office at work, where there are no distractions right now. Get up and don tights, vests, jumper, warm trousers, boots, cardigan, gloves and quilted Canadian coat with hood. Look more like Michelin Man than usual.
8.30 a.m. Set off to walk to work. Wenceslas-like situation in front garden,and difficulty opening gate against foot-high accumulation of snow.
8.35 a.m. See neighbour slide on ice and tumble on back. Fortunately no bones broken. Persevere, optimistic that the main thoroughfares will be cleared of snow.
8.40 a.m. They aren’t. Snow appears to have melted briefly and then refrozen. See another person slide on ice and land on back.
8.45 a.m. Return home.
9.00 a.m. Turn on gas fire and clear desk in preparation for serious academic activity. Get sidetracked by gas bill and unfiled bank statements. Once all the surface debris cleared, decide desk has too many crumbs on it to be compatible with serious work. Engage in desk-cleaning.
9.15 a.m. Open email. Granting agency has sent me material for a proposal I’d agreed to review consisting of six pdfs. Departmental administrator has sent stern message that we all have to come into work or have our pay docked, unless we have explicit agreement to work from home.
9.20 a.m Email the few postdocs who haven’t already gone on holiday to tell them to ignore message from administrator.
9.30 a.m Have a quick look at Twitter. Everyone tweeting about snow or science funding.
9.35 a.m Bite the bullet and open the referee comments. Save as a new file called ‘response to referees’.
9.40 a.m Make a cup of coffee to steel myself
9.45 a.m Read the referee comments. Aargh, aargh, aargh.
9.50 a.m Remember my Australian collaborator has already sent some new analyses and suggestions for responding to referees. Download these.
9.55 a.m Re-read referee comments. Aargh, aargh, aargh, aargh.
10.00 a.m Print out referee comments. Printer not working. Demanding toner. It has insatiable appetite for toner. Take out toner and shake it about and put it back in. Printer sneers: You’ve tried that before and I’m not playing.  Go into cellar (brrr) where I have providently kept spare toner.
10.05 a.m Unable to get into toner box. Into kitchen for a knife. Side-tracked by sight of coffee jar. Make another cup of coffee.  Open toner box. Find piece of paper with instructions for toner replacement in 12 languages. Go through ritual of shaking toner, removing yellow bit, sliding blue bit up and down, replacing toner in printer. 
10.10 a.m Discover I can save children’s lives by sending old printer cartridge back to special address. Want to save children’s lives, so repackage old toner cartridge in box. Go on hunt for sellotape. Find sellotape. Seal up box. Discover label is inside box. Refind knife. Open box. Extract label. Reseal box. Stick label on box.
10.15 a.m Resend printer command. Nothing happens. Take toner cartridge out again, shake it about, and put back in.
10.20 a.m Belatedly realise I have sent document to be printed on default printer, which is at work. Cancel printer command and resend to home printer.
10.25 a.m Get printout of reviewer comments and re-read.  Aargh, aargh, aargh, aargh, aargh. Reviewer 1 argues there six major flaws with the paper and data need total reanalysis.
10.30 a.m  Distract myself from grief with a quick check of email. Two messages from Microban international warning me about bacteria in my kitchen at Christmas (how do they know about me and my kitchen?), one announcement that I have won 1.5 million Euros in a lottery I didn’t enter (could come in handy), and a request to review a manuscript.
10.35 a.m Twitter proves more interesting: everyone has either got a broken-down heating system, or is stuck in an airport somewhere. Feel inappropriately smug. Schadenfreude is a real phenomenon.
10.40 a.m Glance out of window and notice sadly that birds have failed to find piece of stale bread that I had hung up on tree, or nuts stuck in a bush. Deliberate on whether I should do more to encourage birds. Decide this is no good, and must return my attention to the comments.
10.45 a.m Download relevant-looking manuscript that I hope will provide some salvation. Read it and make notes.
11.15 a.m Well, that was quite interesting but no relevance at all for current paper.
11.20 a.m OK, ready to start thinking about reply to reviewer 1. He's one of those people who accuses you of saying something you haven’t and then reprimands you for it.  Grrr. Quick look at BBC News, to distract myself, and counteract irritaiton with more airport grief.
11.25 a.m Print out the original manuscript so I can see exactly what we did say. Do first pass at responses to reviewers.
11.50 a.m. Postman rings with parcel that won’t fit in letterbox. Need to negotiate snowy path to front gate. Find wellies. Struggle into wellies. Collect parcel from remarkably cheery postman. Return and spend 5 mins getting out of wellies.
12.00 p.m. Heavy snow has started falling! Husband has started bustling in kitchen to make seafood risotto for lunch. Yum!
12.05 p.m. Husband suggests that if I want risotto to be amazing rather than just fabulous, I should go and buy white wine. And that some red wine for mulling this evening would also be a good idea. Since he is (a) in tracksuit & slippers and (b) busy with his culinary art, and I am fully dressed, there is justice in this. Muse on advantages of living with licensed 9-to-9 store a stone’s throw away.
12.10 p.m. Don duvet-like coat and boots again and venture into snow, returning with 2 bottles wine and other sundry essentials, on assumption we may be snowed in for days. The street looks magical, like something out of Dickens.
12.15 p.m. Sad blackbird lands on window ledge and looks at me through window. Rummage in fridge for blackbird food and some water.  Only bird-friendly food I can find is olive bread and couscous. Very North Oxford.
12.25 p.m Check email. Boring, boring, boring, but takes 20 mins!
12.45 p.m. Risotto. Yum yum yum! Radio 4 full of tales of airport woe.
13.15 p.m. Quick twitter-scan and tweet to plug my latest blog.
13.25 p.m. Email check. Administrator now telling people to go home early as the Oxford buses are stopping early and they could get stranded. Have visions of a Heathrow-style psychology dept with people bedding down in the coffee area. Now! back to work.
13.30 p.m. Well, I’ve dealt with two comments from ref 2, both of which involve adding the reference number in two places where I’d left it out. Mild sense of achievement. I like ref 2.
13.40 p.m  One of our analyses involves computing intraclass correlation between an individual’s waveform and that of the group average. Referee wants us to leave out the individual when computing grand average.  I know that with sample size of 40+ this makes no difference, but I can’t find out where I checked this out, and so am now going to need to redo the analysis to make this point. But now I need to find the xls formula for the intraclass correlation, which has to be re-entered every time you want to use it in a workbook, and I can’t really remember how to do that either, though I’ve done it loads of times before. I even think I somewhere stored instructions of how to do this, but I can’t find them. So now I am having to run a Search. All to demonstrate to a reviewer something I know is the case and which is not going to make any difference whatsoever to the results.
14.25 p.m. OK, done that. Found formula, worked out how to reinstate it, tried it out on demonstration data and got satisfactory result. Whew.
Looked at email: Message to say my email account is being migrated from one account to another. Had been told this would happen but had forgotten. Scary. But good incentive to stop looking at email for a bit
14.45 p.m. Coffee break .Reading newspapers
15.00 p.m  Must read various articles sent by collaborator to contest arguments being made by reviewer. OK, brain in gear and articles printed out.
15.30 p.m. Husband decides more ingredients needed for culinary art, and is off to Sainsbury’s. Returns 1 min later saying car is entombed in snow and he needs bucket of hot water.
15.35 p.m. Two articles done, and two to go.  Husband returns saying he has removed snowy carapace from car, but now can’t get into it, as door are frozen shut. He sneers at idea we should check for advice on internet re this situation, and stomps off with more buckets of hot water. Internet advice is to use windscreen wiper fluid. Small problem. Our windscreen wiper fluid is in the car, which we can’t get into.
16.45 p.m Have read 4 articles, two of which were mostly incomprehensible and of dubious relevance. Still uncertain if I really need to do more analysis. Send email to collaborator in Australia. However, email system is in process of migrating and it seems uncertain as to whether it is working or not. Time for a small break and a twitter session.
17.00 p.m Need to read more so I can write explanatory section requested by reviewers. Three important articles to download and summarise.
18.00 p.m. Read two of them. V. good and helpful. Now on to inspect whether my email migrated OK.
18.30 p.m. Think it did, as I have messages telling me I have been bequeathed large sums of money, as well as two requests to review manuscripts. Time for some mulled wine.
Tomorrow is another day. Resolve, no more tweeting, email, shopping, bird tending, until paper is revised. But maybe just a little blog about my day.....

Wednesday 22 December 2010

“Neuroprognosis” in dyslexia

Every week seems to bring a new finding about brains of people with various neurodevelopmental disorders. The problem is that the methods get ever more technical, and so the scientific papers get harder and harder to follow, even if, like me, you have some background in neuropsychology.  I don’t normally blog about specific papers, but various people have asked what I think of an article that was published this week by Hoeft et al in Proceedings of National Academy of Science, and so I thought I’d have a shot at (a) summarising what they found in understandable language, and (b) giving my personal evaluation of the study. The bottom line is that the paper seems methodologically sound, but I’m nevertheless sceptical because (a) I’m always sceptical, and (b) there are some aspects of the results that just seem a bit hard to make sense of.

What question did the researchers ask?

Fig 1: location of IFG
The team, headed by researchers from Stanford University of Medicine, took as their starting point two prior observations.
First, previous studies have found that children with dyslexia look different from normal readers when they do reading tasks in a brain scanner. As well as showing under-activation of regions normally involved in reading, dyslexics often show over-activation in the frontal lobes, specifically in the inferior frontal gyri (IFG) (Figure 1).
The researchers were interested in the idea that this IFG activation could be a sign that dyslexics were using different brain regions to compensate for their dyslexia. They reasoned that, if so, then the amount of IFG activity observed on one occasion might predict the amount of reading improvement at a later occasion.
So the specific question was: Does greater involvement of the IFG in reading predict future long-term gains in reading for children with dyslexia?

Who did they study?
The main focus was on 25 teenagers with dyslexia, whose average age was 14 years at the start of the study. The standard deviation was 1.96 years, indicating that most would have been 12 to 16 years old.  There were 12 boys and 13 girls. They were followed up 2.5 years later, at which point they were subdivided, according to how much progress they’d made on one of the reading measures, into a group of 12 with ‘no-gain’ and 13 with ‘reading-gain’.

The criteria for dyslexia were that, at time 1, (a) performance was in the bottom 25% for age on a composite measure based on two subtests of the Test of Word Reading Efficiency (TOWRE), and (b) performance on a nonverbal IQ subtest (WASI Matrices) was in normal limits (within 1 SD of average).  The TOWRE is a speeded reading test with two parts: reading of real words, and reading of made-up words – the latter subtest picks up difficulties with converting letters into sounds. The average scores are given in Supporting Materials, and confirm that these children had a mean nonverbal ability score of 103 and a mean TOWRE score of 80. This is well in line with how dyslexia is often defined, and confirms they were children of normal ability with significant reading difficulties.

In addition, a ‘control’ group of normal readers was recruited, but they don’t feature much in the paper. The aim of including them was to see whether the same brain measures that predicted reading improvement in dyslexics would also predict reading improvement in normal readers.  However, the control children were rather above average in reading to start with, and, perhaps not surprisingly, they did not show any improvement over time beyond that which you’d expect for their age.

How was reading measured?

Children were given a fairly large battery of tests of reading, spelling and related skills, in addition to the TOWRE, which had been used to diagnose dyslexia. These included measures of reading accuracy (how many words are read correctly from a word list), reading speed, and reading comprehension (how far the child understands what is read). The key measure used to evaluate reading improvement over time was the the Word Identification Subtest from the Woodcock Reading Mastery Test (WID).

It is important to realise that all test scores are shown as age-scaled scores. This allows us to ignore the child’s age, as the score just indicates how good or bad the child is relative to others of the same age. For most measures, the scaling is set so that 100 indicates an average score, with standard deviation (SD) of 15 . You can tell how abnormal a score is by seeing how many SDs it is from the mean; around 16% of children get a score of 85 or less (1 SD below average), but only 3% score 70 or less (2 SD below average).  At Time 1, the average scores of the dyslexics were mostly in the high 70s to mid 80s, confirming that these children are doing poorly for their age.

When using age-scaled scores, the expectation is that, if a child doesn’t get better or worse relative to other children over time, then the score will stay the same. So a static score does not mean the child has learned nothing: rather they have just not changed their position relative to other children in terms reading ability.

Another small point: scaled scores can be transformed so that, for instance, instead of being based on a population average of 100 and SD of 15, the average is specified at 10 and SD as 3. The measures used in this study varied in the scaling they used, but I transformed them so they are all on the same scale: average 100, SD 15. This makes it easier to compare effect sizes across different measures (see below).

How was brain activity measured?
Functional magnetic resonance imaging (fMRI) was used to measure brain activity while the child did a reading task in a brain scanner at Time 1. The Wikipedia account of fMRI gives a pretty good introduction for the non-specialist, though my readers may be able to recommend better sources.  The reading task involved judging whether two written words rhymed, e.g. bait-gate (YES) or price-miss (NO). Brain activity was also measured during rest periods, when no task was presented, and this was subtracted from the activity during the rhyme task. This is a standard procedure in fMRI that allows one to see what activation is specifically associated with task performance. Activity is measured across the whole brain, which is subdivided into cubes measuring 2 x 2 x 2 mm (voxels). For each voxel, a measure indicates the amount of activation in that region. There are thousands of voxels, and so a huge amount of data is generated for each person.

The researchers also did another kind of brain imaging measurement, diffusion tensor imaging (DTI). This measures connectivity between different brain regions, and reflects aspects of underlying brain structure. The DTI results are of some interest, but not a critical part of the paper and I won’t say more about them here.

No brain imaging was done at Time 2 (or if it was it was not reported). This was because the goal of the study was to see whether imaging at one point in time could predict outcome later on.

How were the data analysed?

The dyslexics were subdivided into two groups, using a median split based on improvement on the WID test. In other words, those showing the least improvement formed one group (with 12 children) and those with most improvement formed the other (13 children).

The aim, then, was to see how far (a) behavioural measures, such as initial reading test scores, or (b) fMRI results were able to predict which group children came from.

Readers  may have come across a method that is often used to do this kind of classification, known as discriminant function analysis. The basic logic is that you take a bunch of measures, and allocate a weighting to each measure according to how well it distinguishes the two groups. So if the measure had the same average score for both groups, the weighting would be zero, but if it was excellent at distinguishing them, the weighting might be 1.0. You then add together all the measures, multiplied by their weightings, with the aim of getting a total score that will do the best possible job at distinguishing groups.  You can then use this total score to predict, for each person, which group they belong to. This way you can tell how good the prediction is, e.g. what percentage of people are accurately classified.

The extension of this kind of logic to brain imaging is known as multivariate pattern analysis (MVPA). It is nicely explained, with diagrams, on Neuroskeptic’s blog. .  For a more formal tutorial, see

It has long been recognised that there’s a potential problem with this approach, as it can give you spuriously good predictions, because the method will capitalise on chance fluctuations in the data that are not really meaningful. This is known as ‘over-fitting’. One way of getting around this is to use the leave-one-out method.  You repeatedly run the analysis, leaving out data from one participant, and then see if you could predict that person’s group status from the function derived from all the other participants. This is what was done in this study, and it is an accepted method for protecting against spurious findings.

Another way of checking that the results aren’t invalid is to directly estimate how likely it would be to get this result if you just had random data. To do this, you assign all your participants a new group code that is entirely arbitrary, using random numbers. So every person in the study has a 50% chance of being in group A or group B. You then re-run the analysis and see whether you can predict whether a person is an A or B on the basis of the same brain data. If you can, this would indicate you are in trouble, as the groups you have put in to the analysis are arbitrary. Typically, one re-runs this kind of arbitrary analysis many times, in what is called a permutation analysis; if you do it enough times, occasionally you will get a good classification result by chance, but that does not matter, so long as the likelihood of it occurring is very rare, say less than 1 in 1000 runs.  For readers with statistical training, we can say that the permutation analysis is a nice way of getting a direct estimate of the p-value associated with the analysis done with the original groups.

So what did they find?

View My Stats
Fig 2: discriminant function (y-axis) vs reading gain

The classification accuracy of the method using the whole-brain fMRI data was reported as an impressive 92%, which was well above chance.  Also, the score on the function used to separate groups was correlated .73 with the amount of reading improvement. The brain regions that contributed most to the classification included the right IFG, and left prefrontal cortex, where greater improvement was associated with higher activation. Also the left parietotemporal region showed the opposite pattern, with greater improvement in those who showed less activation.

So could the researchers have saved themselves a lot of time and got the same result if they’d just used the time 1 behavioural data as predictors? They argue not. The prediction from the behavioural measures was highly significant, but not as strong, with accuracy reported (figure S1 of Supporting Materials) as less than 60%.  Also, once the brain measures had been entered into the equation, adding behavioural measures did not improve the prediction.

And what conclusions did they draw?

  • Variation in brain function predicts reading improvement in children with dyslexia. In particular, activation of the right IFG during a reading task predicts improvement. However, taking a single brain region alone does not give as good a prediction as combining information across the whole brain.
  • Brain measures are better than behavioural measures at predicting future gains in reading.
  • This suggests that children with dyslexia can use the right IFG to compensate for their reading difficulties.
  • Dyslexics learn to read by using different neural mechanisms than those used by normal readers.

Did they note any limitations of the study?
  • It’s possible that different behavioural measures might have done a better job in predicting outcomes.
  • It’s also possible that a different kind of brain activation task could have given different results.
  • Some children had received remediation during the course of the study: but this didn’t affect their outcomes. (Bad news for those doing the remediation!).
  • Children varied in IQ, age, etc, but this didn’t differentiate those who improved and those who didn’t.

Questions I have about the study

Just how good is the prediction from the brain classifier?

Figure 2 (above) shows on the y-axis the discriminant function (the hyperplane), which is the weighted sum of voxels that does the best job of distinguishing groups. The x-axis shows the reading gain. As you can see clearly, there are two individuals who fall in the lower right quadrant, i.e. they have low hyperplane scores, and so would be predicted to be no-gain cases, but actually they make positive gains. The figure of 92% appears to come by treating these as cases where prediction failed, i.e. accurate prediction for the remainder gives 23/25 = 92% correct.
Fig 3: Vertical line dividing groups moved

However, this is not quite what the authors said they did.  They divided the sample into two equal-sized groups (or as equal as you can get with an odd number) in order to do the analysis, which means that the ‘no improvement’ group contains four additional cases, and that the dividing line for quadrants needs to be moved to the right, as shown in Figure 3.  Once again, accurate prediction occurs for those who fall in the top right quadrant, or the bottom left. Prediction is now rather less good, with 4 cases misclassified (three in the top left quadrant, one in the bottom right, i.e. 84% correct).However, it must be accepted that this is still good prediction.

Why do the reading gain group improve on only some measures?
One odd feature of the data is the rather selective nature of the reading improvement seen in the reading-gain group. Table 1 shows the data, after standardising all measures to a mean of 100, SD 15. The analysis used the WRMT-WID test, which is shown in pink. On this test, and on the other WMRT tests, the reading-gain group do make impressively bigger gains than the no-gain group. But the two groups look very similar on the TOWRE measures, which were used to diagnose dyslexia, and also on the Gray Oral Reading Test (GORT).  Of course, it’s possible that there is something critical about the content of the different tests – the GORT involves passage-reading, and the TOWRE involves speeded reading of lists of items.  But I’d have been a bit more convinced of the generalisability of the findings, if the reading improvement in the reading-gain group had been evident across a wider range of measures.(Note that we also should not assume all gain is meaningful: see my earlier blog for explanation of why).

Why do the control group show similar levels of right IFG activation to dyslexics?
The authors conclude that the involvement of certain brain regions, notably the right IFG, is indicative of an alternative reading strategy to that adopted by typical readers. Yet the control group appear to show as wide a range of activation of this area as the dyslexics, as shown in Figure 4. The authors don’t present statistics on this, but eyeballing the data doesn’t suggest much group difference.
Figure 4: activation in dyslexics (red) and controls (blue)

If involvement of the right IFG improves reading, why don’t dyslexic groups differ at time 1?

This is more of a logical issue than anything else, but it goes like this. Children who improve in reading by time 2 showed a different pattern of brain activation at time 1. The authors argue that right IFG activation predicts better reading. But at time 1, the groups did not differ on the reading measures – or indeed on their performance of the reading task in the scanner. This would be compatible with some kind of ‘sleeper’ effect, whereby the benefits of using the right IFG take time to trickle through. But what makes me uneasy is that this implies the researchers had been lucky enough to just catch the children at the point where they’d started to use the right IFG, but before this had had any beneficial effect.  So I find myself asking what would have happened if they’d started with younger children? 

Overall evaluation
This is an interesting attempt to use neuroimaging to throw light on mechanisms behind compensatory changes in brains of people with dyslexia.  The methodology appears very sound and clearly described (albeit highly technical in places). The idea that the IFG is involved in compensation fits with some other studies in the field.

There are, however, a few features of the data that I find a bit difficult to make sense of, and that makes me wonder about generalisability of this result.

Having said that, this kind of study is challenging. It is not easy to do scanning with children, and just collecting and assessing a well-documented sample can take many months. One then has to wait to follow them up more than two years later. The analyses are highly demanding.  I think we should see this as an important step in the direction of understanding brain mechanisms in dyslexia, but it’s far from being conclusive.

Hoeft F, McCandliss BD, Black JM, Gantman A, Zakerani N, Hulme C, Lyytinen H, Whitfield-Gabrieli S, Glover GH, Reiss AL, & Gabrieli JD (2011). Neural systems predicting long-term outcome in dyslexia. Proceedings of the National Academy of Sciences of the United States of America, 108 (1), 361-6 PMID: 21173250

Saturday 18 December 2010

What's in a name?

In a recent blog post in the Guardian, Maxine Frances Roper discussed how her dyspraxia made it hard for her to get a job. She had major problems with maths and poor physical co-ordination and was concerned that employers were reluctant to make accommodations for these. The comments that followed the blog fell mostly in one of two categories: a) people who described their own (or their child’s) similar experiences; b) people who thought of dyspraxia as an invented disorder with no validity.

Although the article was about dyspraxia, it could equally well have been about developmental dyslexia, dyscalculia or dysphasia. These neurological labels are applied to children whose development is uneven, with selective deficits in the domains of literacy, mathematical skills, and oral language development respectively.  They are often described as neurodevelopmental disorders, a category which can be extended to encompass attention deficit hyperactivity disorder (ADHD), and autistic disorder. Unlike conditions such as Down syndrome or Fragile X syndrome, these are all behaviourally defined conditions that can seldom be pinned down to a single cause.  They are subject to frequent challenges as to their validity. ADHD, for instance, is sometimes described as a medical label for naughty children , and dyslexia as a middle-class excuse for a child’s stupidity.   Autism is a particularly interesting case, where the challenges are most commonly made by individuals with autism themselves, who argue they are different rather than disordered.

So, what does the science say? Are these valid disorders?  I shall argue that these medical-sounding labels are in many respects misleading, but they nevertheless have served a purpose because they get  developmental difficulties taken seriously. I’ll then discuss alternatives to medical labels and end with suggestions for a way forward.

Disadvantages of medical labels

1. Medical labels don't correspond to syndromes

Parents often have a sense of relief at being told their child is dyslexic, as they feel it provides an explanation for the reading difficulties. Most people assume that dyslexia is a clearcut syndrome with a known medical cause, and that affected individuals can be clearly differentiated from other poor readers whose problems are due to poor teaching or low intelligence.

In fact, that is not the case.  Dyslexia, and the other conditions listed above, are all diagnosed on the basis of behavioural rather than neurological criteria. A typical definition of developmental dyslexia specifies that there is a mismatch between reading ability and other aspects of cognitive development, which can’t be explained by any physical cause (e.g. bad eyesight) or poor teaching.  It follows that if you have a diagnosis of dyslexia, this is not an explanation for poor reading; rather it is a way of stating in summary form that your reading difficulties have no obvious explanation. 

But medicine progresses by first recognising clusters of symptoms and then identifying underlying causes for individuals with common patterns of deficits. So even if we don’t yet understand what the causes are, could there could be value in singling out individuals who meet criteria for dyslexia, and distinguishing them from other poor readers? To date, this approach has not been very effective. Forty years ago, an epidemiological study was conducted on the Isle of Wight: children were screened on an extensive battery of psychological and neurological measures.  The researchers were particularly interested in whether poor readers who had a large discrepancy between IQ and reading ability had a distinctive clinical profile.  Overall, there was no support for dyslexia as a distinct syndrome, and in 1976, Bill Yule concluded: “The era of applying the label 'dyslexic' is rapidly drawing to a close. The label has served its function in drawing attention to children who have great difficulty in mastering the arts of reading, writing and spelling, but its continued use invokes emotions which often prevent rational discussion and scientific investigation".(p 166).  Subsequent research has focused on specifying what it is about reading that is so difficult for children who struggle with literacy, and it’s been shown that for most of them, a stumbling block is in the process of breaking words into sounds, so-called phonological awareness.   However, poor phonological awareness is seen in poor readers of low IQ as well as in those with a mismatch between IQ and reading skill.

2. Medical labels don’t identify conditions with distinct causes

What about if we look at underlying causes? It's an exciting period for research as new methods make it possible to study the neurological and genetic bases of these conditions.  Many researchers in this field anticipated that once we could look at brain structure using magnetic resonance imaging, we would be able to identify ‘neural signatures’ for the different neurodevelopmental disorders. Despite frequent over-hyped reports of findings of ‘a brain scan to diagnose autism’ and so on, the reality is complicated.

I'm not attacking researchers who look for brain correlates of these conditions: we know far more now than we did 20 years ago about how typical and atypical brains develop, and basic neuroscience may help us understand the underlying processes involved, which in turn could lead to better diagnosis and intervention. But before concluding that a brain scan can be a feasible diagnostic test, we need studies that go beyond showing that an impaired group differs from an unimpaired group.  In a recent review of pediatric neuroimaging and neurodevelopmental disorders,  Giedd and Rapoport concluded: “The high variability and substantial overlap of most measures for most groups being compared has profound implications for the diagnostic utility of psychiatric neuroimaging” (p. 731) (my italics)

Similar arguments apply in the domain of genetics. If you are interested in the details, I have a blog explaining in more detail, but in brief, there are very few instances where a single genetic mutation can explain dyslexia, ADHD, autism and the rest. Genes play a role, and often an important one, in determining who is at risk for disorder, but it seems increasingly likely that the risk is determined by many genes acting together, each of which has a small effect in nudging the risk up or down. Furthermore, the effect of a given gene will depend on environmental factors, and the same gene may be implicated in more than one disorder. What this means is that research showing genetic influences on neurodevelopmental disorders does not translate into nice simple diagnostic genetic tests. 

3. No clear boundaries between individuals with different diagnostic labels

To most people, medical labels imply distinct disorders with clear boundaries, but in practice, many individuals have multiple difficulties.  Maxine Frances Roper’s  blogpost on dyspraxia illustrates this well: dyspraxia affects motor co-ordination, yet she described major problems with maths, which would indicate dyscalculia. Some of her commentators described cases where a diagnosis of dyspraxia was accompanied by a diagnosis of Asperger syndrome, a subtype of  autistic disorder. In a textbook chapter on neurodevelopmental disorders, Michael Rutter and I argued that pure disorders, where just one domain of functioning is affected, are the exception rather than the rule. This is problematic for a diagnostic system that has distinct categories, because people will end up with multiple diagnoses. Even worse, the diagnosis may depend on which professional they see. I know of cases where the same child has been diagnosed as having dyslexia, dyspraxia, ADHD, and “autistic spectrum disorder” (a milder form of autism), depending on whether their child is seen by a psychologist, an occupational therapist, a paediatrician or a child psychiatrist.

4. No clearcut distinction between normality and abnormality

There has been much debate as to whether the causes of severe difficulties are different from causes of normal variation. The jury is still out, but we can say that if there are qualitative differences between children with these neurodevelopmental disorders and typically developing children, we have yet to find them.  Twenty years ago, many of us expected that we might find single genes that caused SLI or autism, for instance, but although this sometimes occurs, it is quite exceptional.  As noted above, we are usually instead dealing with complex causation from a mixture of multiple genetic and environmental causes.  Robert Plomin and colleagues have argued, on the basis of such evidence, that ‘the abnormal is normal’ and that there are no disorders.

Consequences of abandoning medical labels 

Many people worry that if we say that a label like dyslexia is invalid, then we are denying that their child has real difficulties. This was brought home to me vividly when I was an editor of Journal of Child Psychology and Psychiatry. Keith Stanovich wrote a short piece for the journal putting forward arguments to the effect that there were no qualitative differences between poor readers of average or below average IQ, and therefore the construct of ‘dyslexia’ was invalid. This attracted a barrage of criticism from people who wrote in to complain that dyslexia was real, they worked with dyslexic children, and it was disgraceful for anyone to suggest that these children’s difficulties were fictional.  Of course, that was not what Stanovich had said. Indeed, he was very explicit: “Whether or not there is such a thing as 'dyslexia', there most certainly are children who read markedly below their peers on appropriately comprehensive and standardized tests. In this most prosaic sense, poor readers obviously exist.” (p. 580). He was questioning whether we should distinguish dyslexic children from other poor readers, but not denying that there are children for whom reading is a major struggle.  Exactly the same cycle of events followed a Channel 4 TV documentary, the Dyslexia Myth, which raised similar questions about the validity of singling out one subset of poor readers, the dyslexics, and giving them extra help and attention, when other poor readers, with very similar problems but lower IQs, were ignored. A huge amount of debate was generated, some of which featured in the Psychologist. Here again, those who had tried to make this case were attacked vehemently by people who thought they were denying the reality of children’s reading difficulties. 

Among those taking part in such debates are affected adults, many of whom will say ”People said I was stupid, but in reality I had undiagnosed dyslexia”. This is illuminating, as it stresses how the label has a big effect on people’s self-esteem. It seems that a label such as dyslexia is not viewed by most people as just a redescription of a person’s problems. It is seen as making them more real, emphasises that affected people are not unintelligent, and leads the condition to be taken more seriously than if we just say they have reading difficulties.

Should we abandon medical labels?

So what would the consequences be if we rejected medical labels? Here, it is fascinating to chart what has happened for different conditions, because different solutions have been adopted and we can compare and contrast the impact this has had. Let’s start with dyslexia. On the basis of the Isle of Wight study, Bill Yule and colleagues argued that we should abandon the term ‘developmental dyslexia’ and use instead the less loaded and more descriptive term ‘specific reading retardation’. Because of the negative connotations of ‘retardation’ their proposal did not take off, but the term ‘specific reading disability’ was adopted in some quarters. But, actually, neither term has really caught on.  When I did a bibliometric survey of studies on neurodevelopmental disorders, I tried to include all possible diagnostic labels as search terms. I've just looked  the frequency with which different terms were used to describe studies on developmental reading difficulties. Dyslexia won by a long margin, with over 97% of articles using this term.

Quite the opposite happened, though, with ‘developmental dysphasia’, which was used in the 1960s to refer to difficulties in producing and understanding spoken language in a child of otherwise normal ability.  This term was already going out of fashion in the UK and the USA in the 1970s, when I was doing my doctoral studies, and in my thesis I used ‘specific developmental language disorder’. Subsequently, ‘specific language impairment’ (SLI) became popular in the US research literature, but there is current concern that it implies that language is the only area of difficulty, when children often have additional problems.  Among practitioners, there is even less agreement, largely because of an explicit rejection of a ‘medical model’ by the profession of speech and language therapy (speech-language pathology in the US and Australia). So instead of diagnostic labels practitioners use a variety of descriptive terminology, including ‘language difficulties’, ‘communication problems’, and, most recently in the UK ‘speech, language and communication needs’ (SLCN). [If you've never heard of any of these and want to see how they affect children's lives, see].

There do seem to be important negative consequences, however. As Gina Conti-Ramsden has argued , specific language impairment (or whatever else you want to call it) is a Cinderella subject.  The amount of research funding directed to it is well below what you’d expect, given its frequency and severity, and it would seem that most members of the public have no idea what it is. Furthermore, if you say a child has ‘developmental dysphasia’, that sounds more serious and real than if you say they have ‘specific language impairment’. And to say they have language ‘difficulties’ or ‘needs’ implies to many people that those difficulties are fairly trivial.  Interestingly, there also seems to be an implicit assumption that, if you don’t have a medical label, then biological factors are unimportant, and you are dealing with problems with purely social origins, such as poor parenting or teaching.

An article by Alan Kamhi had a novel take on this issue. He argued that a good label had to have the properties of a meme. The concept of a meme was introduced by Richard Dawkins in the Selfish Gene,  and subsequently developed by Susan Blackmore in her book The Meme Machine. A meme is an element of culture that is transmitted from person to person, and a successful meme has to be easy to understand, remember and communicate to others. Importantly, it does not necessarily have to be accurate or useful.  Kamhi asked “Why is it more desirable to have dyslexia than to have a reading disability? Why does no one other than speech-language pathologists and related professionals seem to know what a language disorder is? Why is Asperger’s syndrome, a relatively new disorder, already familiar to many people?” (p. 105).  Kamhi’s answer is that terms with ‘language’ in them are problematic because everyone thinks they know what language is, but their interpretations differ from those of the professionals. I think there is some truth in this, but there is more to it than that. In general, I’d argue, the medical-sounding terms are more successful memes than the descriptive terms because they convey a spurious sense of explanation, with foreign and medical-sounding labels lending some gravity to the situation.

What to do?
We are stuck between the proverbial rock and hard place.  It seems that if we stick with medical-sounding labels for neurodevelopmental disorders, they are treated seriously and gain public recognition and research funding. Furthermore, they seem to be generally preferred by those who are affected by these conditions. However, we know these labels are misleading in implying that we are dealing with clearcut syndromes with a single known cause.

So here’s a proposal that attempts to steer a course through this morass. We should use the term ‘neurodevelopmental disability’ as a generic term, and then add a descriptor to indicate the areas of major difficulty. Let me explain why each part of the term is useful. “Neurodevelopmental” indicates that the child’s difficulties have a constitutional basis.  This is not the same as saying they can’t be changed, but it does move us away from the idea that these are some kind of social constructs with no biological basis. The evidence for a biological contributory causes is considerable for those conditions where there have been significant neurological and genetic investigations: dyslexia, SLI, autism and ADHD.

I suggest ‘disability’ rather than ‘disorder’ in the hope this may be more acceptable to those who dislike dividing humanity into the disordered and normal. Disability has a specific meaning in the World Health Organization classification, which focuses on the functional consequences of an impairment for everyday life. People who are the focus of our interest are having difficulties functioning at home, work or school, and so ‘disability’ seems a reasonable term to use.

It follows from what I’ve said above, that the boundary between disability and no disability is bound to be fuzzy: most problems fall on a scale of severity, and where you put the cutoff is arbitrary. But in this regard, neurodevelopmental disability is no different from many medical conditions. For instance, if we take a condition such as high blood pressure: there are some people whose blood pressure is so high that it is causing them major symptoms, and everyone would agree they have a disease. But other people may have elevated blood pressure and doctors will be concerned that this is putting health at risk, but where you actually draw the line and decide that treatment is needed is a difficult judgement, and may depend on presence of other risk factors. It’s common to define conditions such as dyslexia or SLI in terms of statistical cutoffs: the child is identified as having the condition if a score on a reading or language test is in the bottom 16% for their age. This is essentially arbitrary, but it is at least an objective and measurable criterion. However, test scores are just one component of diagnosis: a key factor is whether or not the individual is having difficulty in coping at home, work or school.

‘Neurodevelopmental disability’ alone could be used to indicate that the person has real difficulties that merit attention and support, but it lumps together a wide range of difficulties. That is no bad thing, however, given that many individuals have problems in several domains. The term would actively discourage the compartmentalised view of these different conditions, which leads to an unsatisfactory situation where, for instance, researchers in the US have difficulty doing research on the relationship between reading and language disabilities because these are seen as falling under the remit of different funding streams (NICHD and NIDCD respectively), or where a researcher who is studying language difficulties in autism will have much greater chance of obtaining funding (from NIMH) than one who is studying language difficulties in non-autistic children (which are far more common).

Having defined our generic category, we need to add descriptors that specify weaknesses and strengths. Identification of areas of weakness is crucial both for ensuring access to appropriate services, and to make it possible to do research on individuals with common characteristics. Table 1 shows how traditional medical categories would map on to this system, with a downward arrow denoting a problem area, and = denoting no impairment. But this is just to illustrate how the system corresponds to what we already have: my radical proposal is that we could do away with the labels in the top row.

Table 1: Traditional categories (top row) vs new system
A major advantage of this approach is that it would not force us to slot a person into one diagnostic category; rather it will encourage us to consider the whole gamut of developmental difficulties and document which apply in a given case. We know that many people with reading difficulties also have impairments in maths, oral language and/or attention: rather than giving the person a dyslexia label, which focuses on the reading difficulties, the full range of problem areas could be listed.  Intelligence does not feature in the diagnostic definition of autism, yet it makes a big difference to a person’s functioning if intelligence is in the normal range, or above average. Further some people with autism have major problems with literacy, motor skills or attention, others do not. This framework would allow us to specify areas of weakness explicitly, rather than implying that everyone with a common diagnostic label is the same. Further, it would make it easier to document change in functioning over time, as different areas of difficulty emerge or resolve with age.

In addition, a key feature of my proposed approach would be that assessment should also aim to discover any areas that parents or children themselves identify as areas of strength (up arrows), as fostering these can be as important as attempting to remediate areas of difficulty. If we take Maxine Frances Roper as an example, she evidently has good language and intelligence, so her profile would indicate this, together with weaknesses in maths and motor skills.

In the past, the only area of strength that anyone seemed interested in was IQ test performance.  Although this can be an important predictor of outcome, it is not all that matters, and to my mind should be treated just like the other domains of functioning: i.e., we note whether it is a weakness or strength, but do not rely on it to determine whether a child with a difficulty gains access to services.

When we consider people’s strengths, these may not be in cognitive or academic skills. Consider, for example, Temple Grandin. She is a woman with autism who has become a highly respected consultant in animal husbandry because of her unusual ability to put herself in the mind of the animals she works with. Obviously, not every person will have an amazing talent, but most will have some activities that they enjoy and can succeed in. We should try and find out what these are, and ensure they are fostered.

Will it happen?

Although I see this approach as logical and able to overcome many of the problems associated with our current diagnostic systems, I’d be frankly amazed if it were adopted.

For a start, it is complex and has resource implications. Few practitioners or researchers would have the time to do a comprehensive assessment of all the areas of functioning shown in Table 1. Nevertheless, many people would complain that this list is not long enough! What about memory, speech, spelling, executive function, or visuospatial skills, which are currently not represented but are studied by those interested in specific learning disabilities? The potential list of strengths is even more open-ended, and could encompass areas such as sports, music, craft and cookery activities, drama, ability to work with animals, mechanical aptitude and so on.  I’d suggest, though, that the approach would be tractable if we think about this as a two-stage procedure. Initial screening would rely on parent and/or teacher and/or self report to identify areas of concern. Suitable well-validated screening instruments are already available in the domains of language, attention, and social impairment, and this approach could be extended. Areas identified as specific weaknesses could then be the focus of more detailed assessment by a relevant professional.

The main reason I doubt my system would work is that too many people are attached to the existing labels. I’m sure many will feel that terms such as autism, ADHD, and dyslexia have served us well and there’s no need to abandon them.  Professional groups may indeed be threatened by the idea of removing barriers between different developmental disorders. And could we lose more than we gain by ditching terminology that has served us well, as least for some disorders?

Please add your comments

I certainly don’t have all the answers, but I am hoping that by raising this issue, I’ll stimulate some debate. Various academics in the US and UK have been talking about the particularly dire situation of terminology surrounding speech and language disorders, but the issues are broader than this, and we need to hear the voices of those affected by different kinds of neurodevelopmental disabilities, as well as practitioners and researchers.

With thanks to Courtenay Frazier Norbury and Gina Conti-Ramsden for comments on a draft of this post.

PS. 27th December 2010
A couple of relevant links:

More on failure of speech-language pathologists to agree on terminology for developmental language disorders.

Kamhi, A. G. (2007). Thoughts and reflections on developmental language disorders. In A. G. Kamhi, J. J. Masterson & K. Apel (Eds.), Clinical Decision Making in Developmental Language Disorders: Brookes.

A recent Ofsted report, concluding that many children with 'special educational needs' are just poorly taught. 

PPS. 19th June 2011
Problems with the term 'speech, language and communication needs':
Lindsay, G. (2011). The collection and analysis of data on children with speech, language and communication needs: The challenge to education and health services. Child Language Teaching & Therapy, 27(2), 135-150.

This article (Figshare version) can be cited as:
Bishop, Dorothy V M (2014): What's in a name?. figshare.

Tuesday 14 December 2010

When ethics regulations have unethical consequences

I've been blogging away from home lately. On 1st December, Guardian Science published a guest blog from me relating to a recent PLOS One article, in which I examined the amount of research, and research funding, for different neurodevelopmental disorders. There are some worrying disparities between disorders, which need explanation.  I've already had lots of interesting emails and comments on the post, but I'm planning to revisit this topic shortly, and would welcome more input, so please feel free to add comments here if you wish.

My latest blog is a guest post for Science 3.0, on ethical (and unethical) issues in data-sharing.
How (some) researchers view ethics committees (IRBs)

Disclaimer: the author is vice-chair of the Medical Sciences Interdiscliplinary Research Ethics Committee at the University of Oxford, and notes that most committee members are on the side of the angels.