Friday 9 February 2018

Improving reproducibility: the future is with the young

I've recently had the pleasure of reviewing the applications to a course on Advanced Methods for Reproducible Science that I'm running in April together with Marcus Munafo and Chris Chambers.  We take a broad definition of 'Reproducibility' and cover not only ways to ensure that code and data are available for those who wish to reproduce experimental results, but also focus on how to design, analyse and pre-register studies to give replicable and generalisable findings.

There is a strong sense of change in the air. Last year, most applicants were psychologists, even though we prioritised applications in biomedical sciences, as we are funded by the Biotechnology and Biological Sciences Research Council and European College of Neuropsychopharmacology. The sense was that issues of reproducibility were not not so high on the radar of disciplines outside psychology. This year things are different. We again attracted a fair number of psychologists, but we also have applicants from fields as diverse as gene expression, immunology, stem cells, anthropology, pharmacology and bioinformatics.

One thing that came across loud and clear in the letters of application to the course was dissatisfaction with the status quo. I've argued before that we have a duty to sort out poor reproducibility because it leads to enormous waste of time and talent of those who try to build on a glitzy but non-replicable result. I've edited these quotes to avoid identifying the authors, but these comments – all from PhD students or postdocs in a range of disciplines - illustrate my point:
  • 'I wanted to replicate the results of an influential intervention that has been widely adopted. Remarkably, no systematic evidence has ever been published that the approach actually works. So far, it has been extremely difficult to establish contact with initial investigators or find out how to get hold of the original data for re-analysis.' 

  • 'I attempted a replication of a widely-cited study, which failed. Although I first attributed it to a difference between experimental materials in the two studies, I am no longer sure this is the explanation.' 

  • 'I planned to use the methods of a widely cited study for a novel piece of research. The results of this previous study were strong, published in a high impact journal, and the methods apparently straightforward to implement, so this seemed like the perfect approach to test our predictions. Unfortunately, I was never able to capture the previously observed effect.' 

  • 'After working for several years in this area, I have come to the conclusion that much of the research may not be reproducible. Much of it is conducted with extremely small sample sizes, reporting implausibly large effect sizes.' 

  • 'My field is plagued by irreproducibility. Even at this early point in my career, I have been affected in my own work by this issue and I believe it would be difficult to find someone who has not themselves had some relation to the topic.' 

  • 'At the faculty I work in, I have witnessed that many people are still confused about or unaware of the very basics of reproducible research.'

Clearly, we can't generalise to all early-career researchers: those who have applied for the course are a self-selected bunch. Indeed, some of them are already trying to adopt reproducible practices, and to bring about change to the local scientific environment. I hope, though, that what we are seeing is just the beginning of a groundswell of dissatisfaction with the status quo. As Chris Chambers suggested in this podcast, I think that change will come more from the grassroots than from established scientists.

We anticipate that the greater diversity of subjects covered this year will make the course far more challenging for the tutors, but we expect it will also make it even more stimulating and fun than last year (if that is possible!). The course lasts several days and interactions between people are as important as the course content in making it work. I'm pretty sure that the problems and solutions from my own field have relevance for other types of data and methods, but I anticipate I will learn a lot from considering the challenges encountered in other disciplines.

Training early career researchers in reproducible methods does not just benefit them: those who attended the course last year have become enthusiastic advocates for reproducibility, with impacts extending beyond their local labs. We are optimistic that as the benefits of reproducible working become more widely known, the face of science will change so that fewer young people will find their careers stalled because they trusted non-replicable results.


  1. A couple of questions:

    If you are running a course on this subject does this mean that you know the cause of poor reproducibility?

    In the blog you refer to both "reproducibility" and "replicability". Is there a consensus on what these words mean? Do they refer to different or identical concepts?

  2. Thanks for the questions.

    1. There's no one cause. Lots of factors - some from scientists, others from journals, institutions and funders. Brief summary here:

    2. There is more than one consensus: different disciplines have used the terms differently. In fields where much is done with analysis of large existing datasets (e.g. politics, economics, sociology) there is interest in reproducibility in the strictest sense - i.e. if you get the same data set and try to do the same analysis, do you come to the same results. In psychology and biomedicine there's more focus on replicability and generalisability - if you try to repeat an experiment that someone else did, do you get broadly the same result. This is sometimes encompassed under a broader meaning of reproducibility. There are some nice slides by Kirstie Whitaker explaining these distinctions here: z

    Bottom line is you can't be replicable if you aren't reproducible.

    1. So the answer to my first question is "yes".

      I have looked through your slides and you have clearly put a lot of thought into this subject. However, you don't have much discussion on the conduct of the experiment - ie the process of producing data in the first place. In my experience (agriculture, pesticides) this is where things can go terribly wrong. And if the data does not contain measurable signal, or the signal is confounded with one of the treatments, then no amount of analysis and reanalysis will produce anything useful.

      A simple example: an field experiment on the effect of neonicotinoids on bee activity. Beehives were placed in cages and the ecologists had worked out a way of measuring bee activity. Unfortunately bees are like us in that they take time to wake up and the ecologists decided to measure all of the neonic cages first and the untreated cages afterwards. Result: measured effect is confounded with time of day and it appeared that the pesticide was having a massive effect on the bees.