Friday 29 August 2014

Replication and reputation: Whose career matters?

Some people are really uncomfortable with the idea that psychology studies should be replicated. The most striking example is Jason Mitchell, Professor at Harvard University, who famously remarked in an essay that "unsuccessful experiments have no meaningful scientific value".

Hard on his heels now comes UCLA's Matthew Lieberman, who has published a piece in Edge on the replication crisis. Lieberman is careful to point out that he thinks we need replication. Indeed, he thinks no initial study should be taken on face value - it is, according to him, just a scientific anecdote, and we'll always need more data. He emphasises:"Anyone who says that replication isn't absolutely essential to the success of science is pretty crazy on that issue, as far as I'm concerned."

It seems that what he doesn't like, though, is how people are reporting their replication attempts, especially when they fail to confirm the initial finding. "There's a lot of stuff going on", he complains "where there's now people making their careers out of trying to take down other people's careers".  He goes on to say that replications aren't unbiased, and that people often go into them trying to shoot down the original findings and this can lead to bad science:
"Making a public process of replication, and a group deciding who replicates what they replicate, only replicating the most counterintuitive findings, only replicating things that tend to be cheap and easy to replicate, tends to put a target on certain people's heads and not others. I don't think that's very good science that we, as a group, should sanction."
It's perhaps not surprising that a social neuroscientist should be interested in the social consequences of replication, but I would take issue with Lieberman's analysis. His depiction of the power of the non-replicators seems misguided. You do a replication to move up in your career? Seriously? Has Lieberman ever come across anyone who was offered a job because they failed to replicate someone else? Has he ever tried to publish a replication in a high-impact outlet? Give it a try and you'll soon be told it is not novel enough. Many of the most famous journals are notorious for turning down failures to replicate studies that they themselves published.  Lieberman is correct in noting that failures to replicate can get a lot of attention on Twitter, but a strong Twitter following is not going to recommend you to a hiring committee (and, btw, that Kardashian index paper was a parody).

Lieberman makes much of the career penalty for those whose work is not replicated. But anyone who has been following the literature on replication will be aware of just how common non-replication is (see e.g. Ioannidis, 2005). There are various possible reasons for this, and nobody with any sense would count it against someone if they do a well-conducted and adequately powered study that does not replicate. What does count against them is if they start putting forward implausible reasons why the replication must be wrong and they must be right. If they can show the replicators did a bad job, their reputation can only be enhanced. But they'll be in a weak position if their original study was not methodologically strong and should not have been submitted for publication without further evidence to support it.  In other words, reputation and career prospects will, at the end of the day, come down to the scientific rigour of a person's research, not on whether a particular result did or did not cross a threshold of p < .05.

The problem with failures to replicate is that they can arise for at least four reasons, and it can be hard to know which applies in an individual case. One reason, emphasized by Lieberman,  is that the replicator may be incompetent or biased.  But a positive feature of the group replication efforts that Lieberman so dislikes is that the methods and data are entirely open, allowing anyone who wants to evaluate them – see for instance this example. Others have challenged replication failures on the grounds that there are crucial aspects of the methodology that only the original experimenter knows about. To those I recommend making all aspects of methods explicit.

A second possibility is that a scientist does a well-designed study whose results don't replicate because all results are influenced by randomness – this could mean that an original effect was a false positive, or the replication was a false negative. The truth of the matter will only be settled by more, rather than less replication, but there's research showing that the odds are that an initial large effect will be smaller on replication, and may disappear altogether - the so-called Winner's Curse (Button et al, 2012).

The third reason why someone's work doesn't replicate is if they are a charlatan or fraudster, who has learned that they can have a very successful career by telling lies. We all hope they are very rare and we all agree they should be stopped. Nobody would make the assumption that someone must be in this category just because a study fails to replicate.

The fourth reason for lack of replication arises when researchers are badly trained and simply don't understand about probability theory, and so engage in various questionable research practices to tweak their data to arrive at something 'significant'. Although they are innocent of bad intentions, they stifle scientific progress by cluttering the field with nonreplicable results. Unfortunately, such practices are common and often not recognised as a problem, though there is growing awareness of the need to tackle them.

There are repeated references in Lieberman's article to people's careers: not just the people who do the replications ("trying to create a career out of a failure to replicate someone") but also the careers of those who aren't replicated ("When I got into the field it didn't seem like there were any career-threatening giant debates going on"). There is, however, another group whose careers we should consider: graduate students and postdocs who may try to build on published work only to find that the original results don't stand up. Publication of non-replicable findings leads to enormous waste in science and demoralization of the next generation. One reason why I take reproducibility initiatives seriously is because I've seen too many young people demoralized after finding that the exciting effect they want to investigate is actually an illusion.

While I can sympathize with Lieberman's plea for a more friendly and cooperative tone to the debate, at the end of the day, replication is now on the agenda and it is inevitable that there will be increasing numbers of cases of replication failure.

So suppose I conduct a methodologically sound study that fails to replicate a colleague's work. Should I hide my study away for fear of rocking the boat or damaging someone's career? Have a quiet word with the author of the original piece? Rather than holding back for fear of giving offence it is vital that we make our data and methods public: For a great example of how to do this in a rigorous yet civilized fashion I recommend this blogpost by Betsy Levy Paluck.

In short, we need to develop a more mature understanding that the move towards more replication is not about making or breaking careers: it is about providing an opportunity to move science forward, improve our methodology and establish which results are reliable (Ioannidis, 2012). And this can only help the careers of those who come behind us.

Button, K., Ioannidis, J., Mokrysz, C., Nosek, B., Flint, J., Robinson, E., & Munafó, M. (2013). Power failure: why small sample size undermines the reliability of neuroscience Nature Reviews Neuroscience, 14 (6), 365-376 DOI: 10.1038/nrn3475

Ioannidis, J. (2005). Contradicted and Initially Stronger Effects in Highly Cited Clinical Research JAMA, 294 (2) DOI: 10.1001/jama.294.2.218

Ioannidis, J. (2012). Why Science Is Not Necessarily Self-Correcting Perspectives on Psychological Science, 7 (6), 645-654 DOI: 10.1177/1745691612464056

Saturday 23 August 2014

Labels for unexplained language difficulties in children: We need to talk

The view from the Tower of Babel
This week saw the publication of a special issue of the International Journal of Language and Communication Disorders, focusing on labels for children with unexplained language difficulties. Two target articles, one by Sheena Reilly and colleagues, and one by me, are accompanied by an editorial by Susan Ebbels, twenty commentaries, and a final paper where Sheena and I join forces with Bruce Tomblin to try to synthesise the different viewpoints. These articles are free for anyone to access.

Terminological battles are often boring and seldom come to any consensus, so why are we putting time into this thorny issue? Quite simply, because it really matters. As we argue in the articles, having a label affects how a children are perceived, what help they are offered, and how seriously their problems are taken. 'Specific Language Impairment' has very poor name recognition compared to dyslexia and autism, despite being at least as common. Furthermore, unless we can agree on some common language, it's difficult to make progress in research, and to discover, for instance, the underlying causes of language difficulties, how common they are in different parts of the world, or what interventions work.

I was first confronted with the full extent of the problem when I tried to analyse the amount of research and research funding associated with different developmental disorders (Bishop, 2010). There are other conditions, notably autism and dyslexia, where there is plenty of debate about diagnostic criteria, or even about whether the condition exists. But even so, the terminology is reasonably consistent. For children's language difficulties, this is not the case - they can be described as cases of language difficulty, disorder, impairment, disability, needs or delay, with various prefixes such as 'developmental', 'specific' or 'primary'. Some researchers will use such labels with precise meanings, often excluding children who have co-existing conditions, whereas others use them more descriptively. This made it extremely difficult to do a sensible internet search to estimate the amount of research funding associated with children's language difficulties.  

The confusion over labels has, I think, also contributed to the lack of public recognition of language difficulties in children. A couple of years ago, I joined together with Courtenay Norbury, Maggie Snowling, Gina Conti-Ramsden and Becky Clark with the goal of remedying this situation. We started a campaign for Raising Awareness of Language Learning Impairments (RALLI) (Bishop et al., 2012), and set up a YouTube channel to provide basic information. We spent some time debating what terminology to use: "Language learning impairment" was our preferred choice, but many of our videos talk of Specific Language Impairment, simply because that is a more familiar label. The lack of an agreed label proved a real stumbling block for our attempts at public engagement, and we decided that, as well as producing videos, one of our goals would be to get the terminology issue discussed more widely, in the hope of achieving some consensus. It was a very happy coincidence that Sheena Reilly and colleagues were crystallizing their own position on this question in an article in IJLDC, and that they, and the Editors, were willing to include my article, and the commentaries of other RALLI founders, in the published debate.

One thing that came across when reading commentaries on our articles was the disconnect between research and practice. One point on which I agree with Sheena and colleagues is that there is no justification for drawing a distinction between children whose language problems are comparable with below average nonverbal ability, and those who have a mismatch between good nonverbal skills and low language. Research has failed to find any difference between children with uneven or even nonverbal-verbal profiles in terms of responsiveness to intervention or underlying causes. Such a distinction is, however, widely used in educational and clinical settings to decide which children gain access to extra support in school.  Another issue raised by the Reilly et al paper is whether it is logical to use other exclusionary criteria, and to distinguish, for instance, between children who do and don't have autistic features in association with a language problem.  

As Susan Ebbels noted in her editorial, in everyday settings "diagnostic labels and criteria were being used creatively in disputes over access to services both by those seeking to obtain services for children (often parents and their lawyers) who could be accused of ‘diagnostic shopping’ and also by those seeking to deny services (often due to financial constraints) who may use particularly restrictive criteria in order to reduce the number of children qualifying for services". 

We can't afford to ignore this confused situation any longer. The time has come to have a wider debate on these issues, with the aim of reaching a consensus about how terms are used. The Royal College of Speech and Language Therapists has set up a moderated discussion forum where people can give their views on the best way forward. Please do consider adding your voice: it is important that all those affected by this issue have a say, whether you are a speech-language therapist/pathologist, psychologist, teacher, health professional, legal expert, policymaker, a parent of a child with language difficulties, or someone who has experienced language difficulties. We'd also love to hear from those outside the UK - whether English-speaking or not. You can access the discussion forum here.

Finally, to raise awareness of this debate, during the week of 24th-31st August I will be taking over  the @WeSpeechies Twitter handle as guest curator. On Tuesday 26th at 8.a.m. BST there will be a live twitter debate on this topic. Feel free to join in, even if you aren't a regular tweeter.

Bishop, D. (2010). Which Neurodevelopmental Disorders Get Researched and Why? PLoS ONE, 5 (11) DOI: 10.1371/journal.pone.0015112  
Bishop, D., Clark, B., Conti-Ramsden, G., Norbury, C., & Snowling, M. (2012). RALLI: An internet campaign for raising awareness of language learning impairments Child Language Teaching and Therapy, 28 (3), 259-262 DOI: 10.1177/0265659012459467

Slides on this topic are available here.

Addendum Friday 29th August 2014

We've had a great week of interactions on Twitter. A transcript for the week is available here.
I'll look through this and aim to organise the material in due course, but meanwhile would encourage anyone who is interested to continue the discussion on Twitter. I'm appending below some tweets that I generated throughout the week to generate debate.

As noted above, the chat links in to a special issue of the Internat. J Lang. Comm Dis which is free to access here  NB it is not all that obvious but there are 10 commentaries after each target article.

If you want to join the discussion on Twitter, feel free to comment at any time, but, please include the #WeSpeechies hashtag, so we can aggregate comments easily. Also if your comment relates to a numbered question, please add Q1, etc so we can relate them.

Monday started with my attempt to summarise each of the  twenty commentaries in a Tweet-length message.

Summaries from commentaries

Paediatricn Gillian Baird: ICD &DSM classifications talk of 'language disorder'; implies distinct from normal variation.  Disorder’ used for conditions without obvious aetiology; functional effect described separately in ICFDH.

Lauchlan/Boyle, ed psych view. Must ask: ‘Will label change the child's life for the better? Aetiology often irrelevant

Bellair et al: community SALTs. No one label works for both research & clinical. SLI has problems but we can manage them.

Mabel Rice: "SLI has yet to receive widespread adoption in clinical practice, in spite of the great need for it." critical of DSM5: excluded "well-researched category of SLI", included SCD, "with a minimal research base"

Kate Taylor SLP. SLI underidentified. Changing the term won't resolve the issue, which is one of measurement rather than label.

Conti-Ramsden: Any Consensus Panel on terminology must be international and include voices from different languages,

Hansson et al: ICD10 labels don't map on to use by researchers in Sweden . : Sweden: phonological & grammatical difficulties seen as part of language impairment. Soc comm probs separate

Clark & Carter: Survey:Scottish SALTs unclear re terms & diagnostic criteria. Move from exclusionary to inclusionary criteria.

Hüneke & Lascelles Concern that watering down terminology will mean kids lose scarce resources. Prefer medical term 'developmental dysphasia' that gets problems taken seriously

Strudwick/Bauer Concern that labels don't capture comorbidities; most ch with 'SLI' have other problems

Michael Rutter, psychiatrist "both clinical & research classifications needed but they require a different approach"

Rutter: Specific’ implies ‘pure’ language impairment; "not supported by any of the available evidence"

Larry Leonard: Many researchers already use broader definition of SLI: do not use term to mean children have a pure profile. communicatn with the public/other disciplines will be even harder if we adopt generic label ‘language impairment.

Snowling: DSM5 treats Communication Disorders separately from Specific Learning Disorders, yet they often co-occur

Aoife Gallagher,SALT; ethical issue:"who owns diagnosis once it has been given.. who ultimately has the right to take it away"

Andrew Whitehouse: ‘SLI’ provides neat criteria for researchers but label hides behavioural & aetiological heterogeneity

Dockrell/Lindsay Educational perspective re SLI is missing yet day-to-day support of learning/development provided by teachers. in England ‘speech, language & communication needs’ (SLCN) indicates primary need is with language & communication

Grist & Hartshorne: Children & young people we work with rarely describe selves as having SLI or SLCN

Norbury @lilaccourt Relaxing diag criteria will increase demand for services.SALTs shld focus on severe & persistent impairmts

Parsons et al @wordaware Shockwaves through SALT profession if nonverbal IQ criteria and delay/disorder distinction removed .Use of marketing approaches to development of a new term, including consultation with parents & young people.

Wright: legal perspective Much time spent in tribunal appeals arguing re labels: eg is it delay or disorder, is it specific?

Questions for debate

On Tuesday we had a live twitter chat with four question topics, and later in the week, I added further numbered question. Here is the total list – we'd love to hear your thoughts on any or all of these:

Q1 What is your view on use of the diagnostic label SLI? Does it reflect a medical model and is this appropriate.

Q2 is What are appropriate criteria for identifying children's language problems

Q3; Should IQ, ASD features, hearing loss determine whether language-impaired children can access services?

Q4 What terminology is most appropriate for children who have unexplained language problems?

Q5 ICD11 will use'Developmental Language Disorder' and DSM5 uses 'Language Disorder'. What do people think of these terms?

Q6 In research SLI still widely used but without requiring IQ discrepancy. Should we retain SLI but with this broader meaning, or is it just confusing?

Q7 In UK education, Speech, Language and Communication Needs (SLCN) is popular term. Is it used outside UK? Is it useful?

Q8 In UK clinical practice, distinction between language 'delay' & 'disorder' is used, but it has no research support.  Where does delay/disorder distinction come from? How defined?

Q9 Is there any support for a return to the more medical term 'developmental dysphasia'?

Q10. Reilly et al and several commentators suggest we drop 'Specific' and use the term 'Language Impairment' instead .What wld be advantages (e.g. avoids unfair exclusion) and disadvantages (e.g. too broad)?

Q11 What do people think of terms 'Language Learning Impairment' or 'Primary language impairment'? '

Q12 Do diagnostic labels actually help children and families?

Q13 Shld terminology/diagnostic criteria be responsibility of speechies, or shld other professions & families have a say? Assumptions/practices seem v. different in education/medicine/psychology vs speech-language therapy/pathology

Q14 In yr area, who does intervention with kids whose language problems are associated with autism?

Q15 Some  people take pride in identifying themselves as dyslexic. Does this ever happen for kids with language problems? If not, why not?

Q16 Has anyone encountered situation where child not offered intervention bcs language problems attributed to social deprivation?

Q17 Insurance considerations seldom important in UK, but affect label use elsewhere. Do US insurers just require DSM?