Saturday, 9 June 2018

Developmental language disorder: the need for a clinically relevant definition

There's been debate over the new terminology for Developmental Language Disorder (DLD) at a meeting (SRCLD) in the USA. I've not got any of the nuance here, but I feel I should make a quick comment on one issue I was specifically asked about, viz:

As background: the field of children's language disorders has been a terminological minefield. The term Specific Language Impairment (SLI) began to be used widely in the 1980s as a diagnosis for children who had problems acquiring language for no apparent reason. One criterion for the diagnosis was that the child's language problems should be out of line with other aspects of development, and hence 'specific', and this was interpreted as requiring normal range nonverbal IQ (nviq).

The term SLI was never adopted by the two main diagnostic systems -WHO's International Classification of Diseases (ICD) or the American Psychiatric Association's Diagnostic and Statistical Manual (DSM), but the notion that IQ should play a part in the diagnosis became prevalent.

In 2016-7 I headed up the CATALISE project with the specific goal of achieving some consensus about the diagnostic criteria and terminology for children's language disorders: the published papers about this are openly available for all to read (see below). The consensus of a group of experts from a range of professions and countries was to reject SLI in favour of the term DLD.

Any child who meets criteria for SLI will meet criteria for DLD: the main difference is that the use of an IQ cutoff is no longer part of the definition. This does not mean that all children with language difficulties are regarded as having DLD: those who meet criteria for intellectual disability, known syndromes or biomedical conditions are treated separately (see these slides for summary).

The tweet seems to suggest we should retain the term SLI, with its IQ cutoff, because it allows us to do neatly controlled research studies. I realise a brief, second-hand tweet about Rice's views may not be a fair portrayal of what she said, but it does emphasise a bone of contention that was thoroughly gnawed in the discussions of the CATALISE panel, namely, what is the purpose of diagnostic terminology? I would argue its primary purpose is clinical, and clinical considerations are not well-served by research criteria.

The traditional approach to selecting groups for research is to find 'pure' cases - quite simply, if you include children who have other problems beyond language (including other neurodevelopmental difficulties) then it is much harder to know how far you are assessing correlates or causes of language problems: things get messy and associations get hard to interpret. The importance of controlling for nonverbal IQ has been particularly emphasised over many years: quite simply, if you compare language-impaired vs comparison (typically-developing, or td) children on a language or cognitive measure, and the language-impaired group has lower nonverbal ability, then it may be that you are looking at a correlate of nonverbal ability rather than language. Restricting consideration to those who meet stringent IQ criteria to equalise the groups is one way of addressing the issue.

However, there are three big problems with this approach:

1. A child's nonverbal IQ can vary from time to time and it will depend on the test that is used. However, although this is problematic, it's not the main reason for dropping IQ cutoffs; the strongest arguments concern validity rather than reliability of an IQ-based approach.

2. The use of IQ-cutoffs ignores the fact that pure cases of language impairment are the exception rather than the rule. In CATALISE we looked at the evidence and concluded that if we were going to insist that you could only get a diagnosis of DLD if you had no developmental problems beyond language, then we'd exclude many children with language problems (see also this old blogpost). If our main purpose is to get a diagnostic system that is clinically workable, it should be applicable to the children who turn up in our clinics - not just a rarefied few who meet research criteria. An analogy can be drawn with medicine: imagine if your doctor identified you with high blood pressure but refused to treat you unless you were in every other regard fit and healthy. That would seem both unfair and ill-judged. Presence of co-occurring conditions might be important for tracking down underlying causes and determining a treatment path, but it's not a reason for excluding someone from receiving services.

3. Even for research purposes, it is not clear that a focus on highly specific disorders makes sense. An underlying assumption, which I remember starting out with, was the idea that the specific cases were in some important sense different from those who had additional problems. Yet, as noted in the CATALISE papers, the evidence for this assumption is missing: nonverbal IQ has very little bearing on a child's clinical profile, response to intervention, or aetiology. For me, what really knocked my belief in the reality of SLI as a category was doing twin studies: typically, I'd find that identical twins were very similar in their language abilities, but they sometimes differed in nonverbal ability, to the extent that one met criteria for SLI and the other did not. Researchers who treat SLI as a distinct category are at risk of doing research that has no application to the real world.

There is nothing to stop researchers focusing on 'pure' cases of language disorder to answer research questions of theoretical interest, such as questions about the modularity of language. This kind of research uses children with a language disorder as a kind of 'natural experiment' that may inform our understanding of broader issues. It is, however, important not to confuse such research with work whose goal is to discover clinically relevant information.

If practitioners let the theoretical interests of researchers dictate their diagnostic criteria, then they are doing a huge disservice to the many children who end up in a no-man's-land, without either diagnosis or access to intervention. 


Bishop, D. V. M. (2017). Why is it so hard to reach agreement on terminology? The case of developmental language disorder (DLD). International Journal of Language & Communication Disorders, 52(6), 671-680. doi:10.1111/1460-6984.12335

Bishop, D. V. M., Snowling, M. J., Thompson, P. A., Greenhalgh, T., & CATALISE Consortium. (2016). CATALISE: a multinational and multidisciplinary Delphi consensus study. Identifying language impairments in children. PLOS One, 11(7), e0158753. doi:10.1371/journal.pone.0158753

Bishop, D. V. M., Snowling, M. J., Thompson, P. A., Greenhalgh, T., & CATALISE Consortium. (2017). Phase 2 of CATALISE: a multinational and multidisciplinary Delphi consensus study of problems with language development: Terminology. Journal of Child Psychology and Psychiatry, 58(10), 1068-1080. doi:10.1111/jcpp.12721

Sunday, 27 May 2018

Sowing seeds of doubt: how Gilbert et al’s critique of the reproducibility project has played out

In Merchants of Doubt, Eric Conway and Naomi Oreskes describe how raising doubt can be used as an effective weapon against inconvenient science. On topics such as the effects of tobacco on health, climate change and causes of acid rain, it has been possible to delay or curb action to tackle problems by simply emphasising the lack of scientific consensus. This is always an option, because science is characterised by uncertainty, and indeed, we move forward by challenging one another’s findings: only a dead science would have no disagreements. But those raising concerns wield a two-edged sword: spurious and discredited criticisms can disrupt scientific progress, especially if the arguments are complex and technical: people will be left with a sense that they cannot trust the findings, even if they don’t fully understand the matters under debate.

The parallels with Merchants of Doubt occurred to me as I re-read the critique by Gilbert et al of the classic paper by the Open Science Collaboration (OSC) on ‘Estimating the reproducibility of psychological science’. I was prompted to do so because we were discussing the OSC paper in a journal club* and inevitably the question arose as to whether we needed to worry about reproducibility, in the light of the remarkable claim by Gilbert et al:  We show that OSC's article contains three major statistical errors and, when corrected, provides no evidence of a replication crisis. Indeed, the evidence is also consistent with the opposite conclusion -- that the reproducibility of psychological science is quite high and, in fact, statistically indistinguishable from 100%.’

The Gilbert et al critique has, in turn, been the subject of considerable criticism, as well as a response by a subset of the OSC group. I summarise the main points of contention in Table 1: at times they seem to be making a defeatist argument that we don’t need to worry because replication in psychology is bound to be poor: something I have disputed.

But my main focus in this post is simply to consider the impact of the critique on the reproducibility debate by looking at citations of the original article and the critique. A quick check on Web of Science found 797 citations of the OSC paper, 67 citations of Gilbert et al, and 33 citations of the response by Anderson et al.

The next thing I did, admittedly in a very informal fashion, was to download the details of the articles citing Gilbert et al and code them according to the content of what they said, as either supporting Gilbert et al’s view, rejecting the criticism, or being neutral. I discovered I needed a fourth category for papers where the citation seemed wrong or so vague as to be unclassifiable. I discarded any papers where the relevant information could not be readily accessed – I can access most journals via Oxford University but a few were behind paywalls, others were not in English, or did not appear to cite Gilbert et al. This left 44 citing papers that focused on the commentary on the OSC study. Nine of these were supportive of Gilbert et al, two noted problems with their analysis, but 33 were categorised as ‘neutral’, because the citation read something like this: 

Because of the current replicability crisis in psychological science (e.g., Open Science Collaboration, 2015; but see Gilbert, King, Pettigrew, & Wilson, 2016)….”

The strong impression was that the authors of these papers lacked either the appetite or the ability to engage with the detailed arguments in the critique, but had a sense that there was a debate and felt that they should flag this up. That’s when I started to think about Merchants of Doubt: whether intentionally or not, Gilbert et al had created an atmosphere of uncertainty to suggest there is no consensus on whether or not psychology has a reproducibility problem - people are left thinking that it's all very complicated and depends on arguments that are only of interest to statisticians. This makes it easier for those who are reluctant to take action to deal with the issue.

Fortunately, it looks as if Gilbert et al’s critique has been less successful than might have been expected, given the eminence of the authors. This may in part be because the arguments in favour of change are founded not just on demonstrations such as the OSC project, but also on logical analyses of statistical practices and publication biases that have been known about for years (see slides 15-20 of my presentation here). Furthermore, as evidenced in the footnotes to Table 1, social media allows a rapid evaluation of claims and counter-claims that hitherto was not possible when debate was restricted to and controlled by journals. The publication this week of three more big replication studies  just heaps on further empirical evidence that we have a problem that needs addressing. Those who are saying ‘nothing to see here, move along’ cannot retain any credibility.

    Table 1
‘many of OSC’s replication studies drew their samples from different populations than the original studies did’
·     ‘Many’ implies the majority. No attempt to quantify – just gives examples
·     Did not show that this feature affected replication rate
‘many of OSC’s replication studies used procedures that differed from the original study’s procedures in substantial ways.’
·     ‘Many’ implies the majority. No attempt to quantify – just gives examples
·     OSC showed that this did not affect replication rate
·     Most striking example used by Gilbert et al is given detailed explanation by Nosek (1)  
‘How many of their replication studies should we expect to have failed by chance alone? Making this estimate requires having data from multiple replications of the same original study.’
Used data from pairwise comparisons of studies from the Many Labs project to argue a low rate of agreement is to be expected.
·     Ignores publication bias impact on original studies (2, 3)
·     G et al misinterpret confidence intervals (3, 4)
·     G et al fail to take sample size/power into account, though this is crucial determinant of confidence interval (3, 4)
·      ‘Gilbert et al.’s focus on the CI measure of reproducibility neither addresses nor can account for the facts that the OSC2015 replication effect sizes were about half the size of the original studies on average, and 83% of replications elicited smaller effect sizes than the original studies.’ (2)
Results depended on whether original authors endorsed the protocol for the replication: ‘This strongly suggests that the infidelities did not just introduce random error but instead biased the replication studies toward failure.
·     Use of term ‘the infidelities’ assumes the only reason for lack of endorsement is departure from original protocol. (2)
·     Lack of endorsement included non-response from original authors (3)

Anderson, C. J., Bahnik, S., Barnett-Cowan, M., & et al. (2016). Response to Comment on "Estimating the reproducibility of psychological science". Science, 351(6277).
Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on "Estimating the reproducibility of psychological science". Science, 351(6277).
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Nature, 349(6251). doi:10.1126/science.aac4716

*Thanks to the enthusiastic efforts of some of our grad students, and the support of Reproducible Research Oxford, we’ve had a series of Reproducibilitea journal clubs in our department this term.  I can recommend this as a great – and relatively cheap and easy - way of raising awareness of issues around reproducibility in a department: something that is sorely needed if a recent Twitter survey by Dan Lakens is anything to go by.

Sunday, 13 May 2018

How to survive on Twitter – a simple rule to reduce stress

In recent weeks, I’ve seen tweets from a handful of people I follow saying they are thinking of giving up Twitter because it has become so negative. Of course they are entitled to do so, and they may find that it frees up time and mental space that could be better used for other things. The problem, though, is that I detect a sense of regret. And this is appropriate because Twitter, used judiciously, has great potential for good.

For me as an academic, the benefits include:
·      Finding out about latest papers and other developments relevant to my work
·      Discovering new people with interesting points of view – often these aren’t eminent or well-known and I’d never have come across them if I hadn’t been on social media
·      Being able to ask for advice from experts – sometimes getting a remarkably quick and relevant response
·      Being able to interact with non-academics who are interested in the same stuff as me
·      Getting a much better sense of the diversity of views in the broader community about topics I take for granted – this often influences how I go about public engagement
·      Having fun – there are lots of witty people who brighten my day with their tweets

The bad side, of course, is that some people say things on Twitter that they would not dream of saying to your face. They can be rude, abusive, and cruel, and sometimes mind-bogglingly impervious to reason. We now know that some of them are not even real people – they are just bots set up by those who want to sow discord among those with different political views. So how do we deal with that?

Well, I have a pretty simple rule that works for me, which is that if I find someone rude, obnoxious, irritating or tedious, I mute them. Muting differs from blocking in that the person doesn’t know they are muted. So they may continue hurling abuse or provocations at you, unaware that they are now screaming into the void.

A few years ago, when I first got into a situation where I was attacked by a group of unpleasant alt-right people (who I now realise were probably mostly bots), it didn’t feel right to ignore them, for three reasons:
·      First, they were publicly maligning me, and I felt I should defend myself.
·      Second, we’ve been told to beware the Twitter bubble. If we only interact on social media with those who are like-minded: it can create a totally false impression of what the world is like.
·      Third, walking away from an argument is not a thing a good academic does: we are trained experts in reasoned debate, and our whole instinct is to engage with those who disagree with us, examine what they say and make a counterargument.

But I soon learned that some people on social media don’t play by the rules of academic engagement. They are not sincere in their desire to discuss topics: they have a viewpoint that nothing will change, and they will use any method they can find to discredit an opponent. This includes ad hominem attacks, lying and wilful misrepresentation of what you say.  It's not cowardly to avoid these people: it's just a sensible reaction. So I now just mute anyone where I get a whiff of such behaviour – directed either towards me or anyone else.

The thing is, social media is so different from normal face-to-face interaction, that it needs different rules. Just imagine if you were sitting with friends at the pub, having a chat, and someone barged in and started shouting at you aggressively. Or someone sat down next to you, uninvited, and proceeded to drone on about a very boring topic, impervious to the impact they are having. People may have different ways of extricating themselves from these situations, but one thing you can be sure of: when you next go to the pub, you would not seek these individuals out and try to engage them in discussion.

So my rule boils down to this: Ask yourself, if I was talking to this person in the pub, would I want to prolong the interaction? Or, if there was a button that I could press to make them disappear, would I use it?  Well, on social media, there is such a button, and I recommend taking advantage of it.*

*I should make it clear that there are situations when a person is subject to such a volume of abuse that this isn’t going to be effective. Avoidance of Twitter for a while may be the only sensible option in such cases. My advice is intended for those who aren’t the centre of a vitriolic campaign, but who are turned off Twitter because of the stress it causes to observe or participate in hostile Twitter exchanges.

Wednesday, 9 May 2018

My response to the EPA's 'Strengthening Transparency in Regulatory Science'

Incredible things have happened at the US Environmental Protection Agency since Donald Trump was elected. The agency is responsible for creating standards and laws that promote the health of individuals and the environment. During previous administrations it has overseen laws concerned with controlling pollution and regulating carbon emissions. Now, under Administrator Scott Pruitt, the voice of industry and climate scepticism is in the ascendant. 

A new rule that purports to 'Strengthen Transparency in Regulatory Science' has now been proposed - ironically, at a time when the EPA is being accused of a culture of secrecy regarding its own inner workings. Anyone can comment on the rule here: I have done so, but my comment appears to be in moderation, so I am posting it here.

Dear Mr Pruitt

re: Regulatory Science- Docket ID No. EPA-HQ-OA-2018-0259

The proposed rule, ‘Strengthening transparency in regulatory science’ brings together two strands of contemporary scientific activity. On the one hand, there is a trend to make policy more evidence-based and transparent. On the other hand, there has, over the past decade, been growing awareness of problems with how science is being done, leading to research that is not always reproducible (the same results achieved by re-analysis of the data) or replicable (similar results when an experiment is repeated). The proposed rule by the Environmental Protection Agency (EPA) brings these two strands together by proposing that policy should only be based on research that has openly available public data. While this may on the surface sound like a progressive way of integrating these two strands, it rests on an over-simplified view of how science works and has considerable potential for doing harm.

I am writing in a personal capacity, as someone at the forefront of moves to improve reproducibility and replication of science in the UK. I chaired a symposium at the Academy of Medical Sciences on this topic in 2015; this was jointly organised with UK major funders: Wellcome Trust, Medical Research Council and Biotechnology and Biological Science Research Council ( I am involved in training early career researchers in methods to improve reproducibility, and I am a co-author of Munafò, M. R et al  (2017). A manifesto for reproducible science. Nature Human Behavior, 1(1: 0021). doi:10.1038/s41562-016-0021. I would welcome any move by the US Government that would strengthen research by encouraging adoption of methods to improve science, including making analysis scripts and data open when this is not in conflict with legal/ethical issues. Unfortunately, this proposal will not do that. Instead, it will weaken science by drawing a line in the sand that effectively disregards scientific discoveries prior to the first part of the 21st century when the importance of open data started to be increasingly recognised.

The proposal ignores a key point about how scientific research works: the time scale. Most studies that would be relevant to EPA take years to do, and even longer to filter through to affect policy. Consequences for people and the planet that are relevant to environmental protection are often not immediately obvious: if they were, we would not need research. Recognition, for instance, of the dangers of asbestos, took years because the impacts on health were not immediate. Work demonstrating the connection occurred many years ago and I doubt that the data are anywhere openly available, yet the EPA’s proposed rule would imply that it could be disregarded. Similarly, I doubt there is open data demonstrating the impact of lead in paint or exhaust fumes, or of pesticides such as DDT: does this mean that manufacturers would be free to reintroduce these?

A second point is that scientific advances never depend on a single study: having open scripts and data is one way of improving our ability to check findings, but it is a relatively recent development, and it is certainly not the only way to validate science. The growth of knowledge has always depended on converging evidence from different sources, replications by different scientists and theoretical understanding of mechanisms. Scientific facts become established when the evidence is overwhelming. The EPA proposal would throw out the mass of accumulated scientific evidence from the past, when open practices were not customary – and indeed often not practical before computers for big data were available.

Contemporary scientific research is far from perfect, but the solution is not to ignore it, but to take steps to improve it and to educate policy-makers in how to identify strong science; government needs advisors who have scientific expertise and no conflict of interest, who can integrate existing evidence with policy implications. The ‘Strengthening Transparency’ proposal is short-sighted and dangerous and appears to have been developed by people with little understanding of science. It puts citizens at risk of significant damage – both to health and prosperity -  and it will make the US look scientifically illiterate to the rest of the world.

Yours sincerely

D. V. M. Bishop FMedSci, FBA, FRS

Thursday, 3 May 2018

Power, responsibility and role models in academia

Last week, Robert J. Sternberg resigned as Editor of Perspectives on Psychological Science after a series of criticisms of his behaviour on social media. I first became aware of this issue when Bobbie Spellman wrote a blogpost explaining why she was not renewing her membership of the Association for Psychological Science, noting concerns about Sternberg’s editorial bias and high rate of self-citation, among other issues.

Then a grad student at the University of Leicester, Brendan O’Connor, noted that Sternberg not only had a tendency to cite his own work; he also recycled large portions of written text in his publications. Nick Brown publicised some striking examples on his blog, and Retraction Watch subsequently published an interview with O’Connor explaining the origins of the story.

In discussing his resignation, Sternberg admitted to ‘lapses in judgement and mistakes’ but also reprimanded those who had outed him for putting their concerns online, rather than contacting him directly. A loyal colleague, James C. Kaufman, then came to his defence, tweeting:

The term ‘witch-hunt’ is routinely trotted out whenever a senior person is criticised. (Indeed, it has become one of Donald Trump’s favourite terms to describe attempts to call him out for various misbehaviours). It implies that those who are protesting at wrongdoing are self-important people who are trying to gain attention by whipping up a sense of moral panic about relatively trivial matters.

I find this both irritating and symptomatic of a deep problem in academic life. I do not regard Sternberg’s transgressions as particularly serious: He used his ready access to a publishing platform for self-promotion and self-plagiarism, was discovered, and resigned his editorial position with a rather grumbly semi-apology. If that was all there was to it, I would agree that everyone should move on.  

The problem is with the attitude of senior people such as Kaufman. A key point is missed by those who want to minimise Sternberg’s misbehaviour: He is one of the most successful psychologists in the world, and so to the next generation, he is a living embodiment of what you need to do to become a leader in the field.  So early-career scientists will look at him and conclude that to get to the top you need to bend the rules.

In terms of abuse of editorial power, Sternberg’s behaviour is relatively tame. Consider the case of Johnny Matson, Jeff Sigafoos, Giuliano Lancioni and Mark O’Reilly, who formed a coterie of editors and editorial board members who enhanced their publications and citations by ditching usual practices such as peer review when handling one another’s papers. I documented the evidence for this back in 2015, and there appear to have been no consequences for any of these individuals. You might think it isn’t so important if a load of dodgy papers make it into a few journals, but in this case, there was potential for damage beyond academia: the subject matter concerned developmental disorders, and methods of assessment and intervention were given unjustified credibility by being published in journals that were thought to be peer-reviewed. In addition, the corrosive influence on the next generation of psychologists was all too evident: When I first wrote about this, I was contacted by several early-career people who had worked with the dodgy editors: they confirmed that they were encouraged to adopt similar practices if they wanted to get ahead.

When we turn to abuse of personal power, there have been instances in academia that are much, much worse than editorial misdemeanours – clearly documented cases of senior academics acting as sexual predators on junior staff – see, for instance, here and here. With the #MeToo campaign (another ‘witch-hunt’), things are starting to change, but the recurring theme is that if you are sufficiently powerful you can get away with almost anything.

Institutions that hire top academics seem desperate to cling on to them because they bring in grants and fame.  Of course, accusations need to be fully investigated in a fair and impartial fashion, but in matters such as editorial transgressions, the evidence is there for all to see, and a prompt response is required.

The problem with the academic hierarchy is that at the top there is a great deal of power and precious little responsibility. Those who make it to positions of authority should uphold high professional standards and act as academic role models. At a time when many early-career researchers are complaining that their PIs are encouraging them to adopt bad scientific practices, it’s all the more important that we don’t send the message that you need to act selfishly and cut corners in order to succeed.

I don’t want to see Sternberg vilified, but I do think the onus is now on the academic establishment to follow Bobbie Spellman’s lead and state publicly that his behaviour fell below what we would expect from an academic role model – rather than sweeping it under the carpet or, even worse, portraying him as a victim. 

Saturday, 7 April 2018

Should research funding be allocated at random?

Earlier this week, a group of early-career scientists had an opportunity to quiz Jim Smith, Director of Science at the Wellcome Trust. The ECRs were attending a course on Advanced Methods for Reproducible Science that I ran with Chris Chambers and Marcus Munafo, and Jim kindly agreed to come along for an after-dinner session which started in a lecture room and ended in the bar.

Among other things, he was asked about the demand for small-scale funding. In some areas of science, a grant of £20-30K could be very useful in enabling a scientist to employ an assistant to gather data, or to buy a key piece of equipment. Jim pointed out that from a funder’s perspective, small grants are not an attractive proposition, because the costs of administering them (finding reviewers, running grant panels, etc.) are high relative to the benefits they achieve. And it’s likely that there will be far more applicants for small grants.

This made me wonder whether we might retain the benefits of small grants by dispensing with the bureaucracy. A committee would have to scrutinise proposals to make sure that the proposal met the funder’s remit, and were of high methodological quality; provided that were so, then the proposal could be entered into a pool, with winners selected at random.

Implicit in this proposal is the idea that it isn’t possible to rank applications reliably. If a lottery approach meant we ended up funding weak research and denying funds to excellent project, this would clearly be a bad thing. But research rankings by committee and/or peer review is notoriously unreliable, and it is hard to compare proposals that span a range of disciplines. Many people feel that funding is already a lottery, albeit an unintentional one, because the same grant that succeeds in one round may be rejected in the next. Interviews are problematic because they mean that a major decision – fund or not – is decided on the basis of a short sample of a candidate’s behaviour, and that people with great proposals but poor social skills may be turned down in favour of glib individuals who can sell themselves more effectively.

I thought it would be interesting to float this idea in a Twitter poll.  I anticipated that enthusiasm for the lottery approach might be higher among those who had been unsuccessful in getting funding, but in fact, the final result was pretty similar, regardless of funding status of the respondent: most approved of a lottery approach, with 66% in favour and 34% against.

As is often the way with Twitter, the poll encouraged people to point me to an existing literature I had not been aware of. In particular, last year, Mark Humphries (@markdhumphries) made a compelling argument for randomness in funding allocations, focusing on the expense and unreliability of current peer review systems. Hilda Bastian and others pointed me to work by Shahar Avin , who has done a detailed scholarly analysis of policy implications for random funding – in the course of which he mentions three funding systems where this has been tried.  In another manuscript, Avin presented a computer simulation to compare explicit random allocation with peer review. The code is openly available, and the results from the scenarios modelled by Avin are provocative in supporting the case for including an element of randomness in funding. (Readers may also be interested in this simulation of the effect of luck on a meritocracy, which is not specific to research funding but has some relevance.) Others pointed to even more radical proposals, such as collective allocation of science funding, giving all researchers a limited amount of funding, or yoking risk to reward.

Having considered these sources and a range of additional comments on the proposal, I think it does look as if it would be worth a funder such as Wellcome Trust doing a trial of random allocation of funding for proposals meeting a quality criterion. As noted by Dylan Wiliam, the key question is whether peer review does indeed select the best proposals. To test this, those who applied for Seed Funding could be randomly directed to either stream A, where proposals undergo conventional evaluation by committee, or stream B, where the committee engages in a relatively light touch process to decide whether to enter the proposal into a lottery, which then decides its fate. Streams A and B could each have the same budget, and their outcomes could be compared a few years later.

One reason I’d recommend this approach specifically for Seed Funding is because of the disproportionate administrative burden for small grants. There would, in principle, be no reason for not extending the idea to larger grants, but I suspect that the more money is at stake, the greater will be the reluctance to include an explicit element of chance in the funding decision. And, as Shahar Avin noted, very expensive projects need long-term support, which makes a lottery approach unsuitable.

Some of those responding to the poll noted potential drawbacks. Hazel Phillips suggested that random assignment would make it harder to include strategic concerns, such as career stage or importance of topic. But if the funder had particular priorities of this kind, they could create a separate pool for a subset of proposals that met additional criteria and that would be given a higher chance of funding. Another concern was gaming by institutions or individuals submitting numerous proposals in scattergun fashion. Again, I don’t see this as a serious objection, as (a) use of an initial quality triage would weed out proposals that were poorly motivated and (b) applicants could be limited to one proposal per round. Most of the other comments that were critical expressed concerns about the initial triage: how would the threshold for entry into the pool be set?  A triage stage may look as if one is just pushing back the decision-making problem to an earlier step, but in practice, it would be feasible to develop transparent criteria for determining which proposals didn’t get into the pool: some have methodological limitations which mean they couldn’t give a coherent answer to the question they pose; some research questions are ill-formed; others have already been answered adequately -  this blogpost by Paul Glasziou and Iain Chalmers makes a good start in identifying characteristics of research proposals that should not be considered for funding.

My view is that there are advantages for the lottery approach over and above the resource issues. First, Avin’s analysis concludes that reliance on peer review leads to a bias against risk-taking, which can mean that novelty and creativity are discouraged. Second, once a proposal was in the pool, there would be no scope for bias against researchers in terms of gender or race – something that can be a particular concern when interviews are used to assess. Third, the impact on the science community is also worth considering. Far less grief would be engendered by a grant rejection if you knew it was that you were unlucky, rather than that you were judged to be wanting. Furthermore, as noted by Marina Papoutsi, some institutions evaluate their staff in terms of how much grant income they bring in – a process that ignores the strong element of chance that already affects funding decisions. A lottery approach, where the randomness is explicit, would put paid to such practices.


Friday, 9 February 2018

Improving reproducibility: the future is with the young

I've recently had the pleasure of reviewing the applications to a course on Advanced Methods for Reproducible Science that I'm running in April together with Marcus Munafo and Chris Chambers.  We take a broad definition of 'Reproducibility' and cover not only ways to ensure that code and data are available for those who wish to reproduce experimental results, but also focus on how to design, analyse and pre-register studies to give replicable and generalisable findings.

There is a strong sense of change in the air. Last year, most applicants were psychologists, even though we prioritised applications in biomedical sciences, as we are funded by the Biotechnology and Biological Sciences Research Council and European College of Neuropsychopharmacology. The sense was that issues of reproducibility were not not so high on the radar of disciplines outside psychology. This year things are different. We again attracted a fair number of psychologists, but we also have applicants from fields as diverse as gene expression, immunology, stem cells, anthropology, pharmacology and bioinformatics.

One thing that came across loud and clear in the letters of application to the course was dissatisfaction with the status quo. I've argued before that we have a duty to sort out poor reproducibility because it leads to enormous waste of time and talent of those who try to build on a glitzy but non-replicable result. I've edited these quotes to avoid identifying the authors, but these comments – all from PhD students or postdocs in a range of disciplines - illustrate my point:
  • 'I wanted to replicate the results of an influential intervention that has been widely adopted. Remarkably, no systematic evidence has ever been published that the approach actually works. So far, it has been extremely difficult to establish contact with initial investigators or find out how to get hold of the original data for re-analysis.' 

  • 'I attempted a replication of a widely-cited study, which failed. Although I first attributed it to a difference between experimental materials in the two studies, I am no longer sure this is the explanation.' 

  • 'I planned to use the methods of a widely cited study for a novel piece of research. The results of this previous study were strong, published in a high impact journal, and the methods apparently straightforward to implement, so this seemed like the perfect approach to test our predictions. Unfortunately, I was never able to capture the previously observed effect.' 

  • 'After working for several years in this area, I have come to the conclusion that much of the research may not be reproducible. Much of it is conducted with extremely small sample sizes, reporting implausibly large effect sizes.' 

  • 'My field is plagued by irreproducibility. Even at this early point in my career, I have been affected in my own work by this issue and I believe it would be difficult to find someone who has not themselves had some relation to the topic.' 

  • 'At the faculty I work in, I have witnessed that many people are still confused about or unaware of the very basics of reproducible research.'

Clearly, we can't generalise to all early-career researchers: those who have applied for the course are a self-selected bunch. Indeed, some of them are already trying to adopt reproducible practices, and to bring about change to the local scientific environment. I hope, though, that what we are seeing is just the beginning of a groundswell of dissatisfaction with the status quo. As Chris Chambers suggested in this podcast, I think that change will come more from the grassroots than from established scientists.

We anticipate that the greater diversity of subjects covered this year will make the course far more challenging for the tutors, but we expect it will also make it even more stimulating and fun than last year (if that is possible!). The course lasts several days and interactions between people are as important as the course content in making it work. I'm pretty sure that the problems and solutions from my own field have relevance for other types of data and methods, but I anticipate I will learn a lot from considering the challenges encountered in other disciplines.

Training early career researchers in reproducible methods does not just benefit them: those who attended the course last year have become enthusiastic advocates for reproducibility, with impacts extending beyond their local labs. We are optimistic that as the benefits of reproducible working become more widely known, the face of science will change so that fewer young people will find their careers stalled because they trusted non-replicable results.

Friday, 12 January 2018

Do you really want another referendum? Be careful what you wish for

Many people in my Twitter timeline have been calling for another referendum on Brexit. Since most of the people I follow regard Brexit as an unmitigated disaster, one can see they are desperate to adopt any measure that might stop it.

Things have now got even more interesting with arch-Brexiteer, Nigel Farage, calling yesterday for another referendum. Unless he is playing a particularly complicated game, he presumably also thinks that his side will win – and with an increased majority that will ensure that Brexit is not disrupted.

Let me be clear. I think Brexit is a disaster. But I really do not think another referendum is a good idea. If there's one thing that the last referendum demonstrated, it is that this is a terrible method for making political decisions on complicated issues.

I'm well-educated and well-read, yet at the time of the referendum, I understood very little about how the EU worked. My main information came from newspapers and social media – including articles such as this nuanced and thoughtful speech on the advantages and disadvantages of EU membership by Theresa May. (The contrast between this and her current mindless and robotic pursuit of extreme Brexit is so marked that I do wonder if she has been kidnapped and brainwashed at some point).

I was pretty sure that it would be bad for me as a scientist to lose opportunities to collaborate with European colleagues, and at a personal level I felt deeply European while also proud of the UK as a tolerant and fair-minded society. But I did not understand the complicated financial, legal, and trading arrangements between the UK and Europe, I had no idea of possible implications for Northern Ireland – this topic was pretty much ignored by the media that I got my information from. As far as I remember, debates on the topic on the TV were few and far between, and were couched as slanging matches between opposite sides – with Nigel Farage continually popping up to tell us about the dangers of unfettered immigration. I remember arguing with a Brexiteer group in Oxford Cornmarket who were distributing leaflets about the millions that would flow to the NHS if we left the EU, but who had no evidence to back up this assertion. There were some challenges to these claims on radio and TV, but the voices of impartial experts were seldom heard.

After the referendum, there were some stunning interviews with the populace exploring their reasons for voting. News reporters were despatched to Brexit hotspots, where they interviewed jubilant supporters, many of whom stated that the UK would now be cleansed of foreigners and British sovereignty restored. Some of them also mentioned funding of the NHS: the general impression was that being in the EU meant that an emasculated Britain had to put up with foreigners on British soil while at the same time giving away money to foreigners in Europe. The EU was perceived as a big bully that took from us and never gave back, and where the UK had no voice. The reporters never challenged these views, or asked about other issues, such as financial or other benefits of EU membership.

Of course there were people who supported Brexit for sound, logical reasons, but they seemed to be pretty thin on the ground. A substantial proportion of those voting seemed swayed by arguments about decreasing the number of foreigners in the UK and/or spending money on the NHS rather than 'giving it to Europe'.

Remainers who want another referendum seem to think that, now we've seen the reality of the financial costs of Brexit, and the exodus of talented Europeans from our hospitals, schools, and universities, the populace will see through the deception foisted on them in 2016. I wonder. If Nigel Farage wants a referendum, this could simply mean that he is more confident than ever of his ability to manipulate mainstream and social media to play on people's fears of foreigners. We now know more about sophisticated new propaganda methods that can be used on social media, but that does not mean we have adequate defences against them.

The only thing that would make me feel positive about a referendum would be if you had to demonstrate that you understood what you were voting for. You'd need a handful of simple questions about factual aspects of EU membership – and a person's vote would only be counted if these questions were accurately answered. This would, however, disenfranchise a high proportion of voters, and would be portrayed as an attack on democracy. So that is not going to happen. I think there's a strong risk that if we have another referendum, it will either be too close to call, or give the same result as before, and we'll be no further ahead.

But the most serious objection to another referendum is that it is a flawed method for making political decisions. As noted in this blogpost:

(A referendum requires) a complex, and often emotionally charged issue, to be reduced to a binary yes/no question.  When considering a relationship the UK has been in for over 40 years a simple yes/no or “remain/leave” question raises many complex and inter-connected questions that even professional politicians could not fully answer during or after the campaign. The EU referendum required a largely uninformed electorate to make a choice between the status quo and an extremely unpredictable outcome.

Rather than a referendum, I'd like to see decisions about EU membership made by those with considerable expertise in EU affairs who will make an honest judgement about what is in the best interests of the UK. Sadly, that does not seem to be an option offered to us.