Monday, 1 May 2017

Reproducible practices are the future for early career researchers

This post was prompted by an interesting exchange on Twitter with Brent Roberts (@BrentWRoberts) yesterday. Brent had recently posted a piece about the difficulty of bringing about change to improve reproducibility in psychology, and this had led to some discussion about what could be done to move things forward. Matt Motyl (@mattmotyl) tweeted:

I had one colleague tell me that sharing data/scripts is "too high a bar" and that I am wrong for insisting all students who work w me do it

And Brent agreed:

We were recently told that teaching our students to pre-register, do power analysis, and replicate was "undermining" careers.

Now, as a co-author of a manifesto for reproducible science, this kind of thing makes me pretty cross, and so I weighed in, demanding to know who was issuing such rubbish advice. Brent patiently explained that most of his colleagues take this view and are skeptics, agnostics or just naïve about the need to tackle reproducibility. I said that was just shafting the next generation, but Brent replied:

Not as long as the incentive structure remains the same.  In these conditions they are helping their students.

So things have got to the point where I need more than 140 characters to make my case. I should stress that I recognise that Brent is one of the good guys, who is trying to make a difference. But I think he is way too pessimistic about the rate of progress, and far from 'helping' their students, the people who resist change are badly damaging them.  So here are my reasons.

1.     The incentive structure really is changing. The main drivers are funders, who are alarmed that they might be spending their precious funds on results that are not solid. In the UK, funders (Wellcome Trust and Research Councils) were behind a high profile symposium on Reproducibility, and subsequently have issued statements on the topic and started working to change policies and to ensure their panel members are aware of the issues. One council, the BBSRC, funded an Advanced Workshop on Reproducible Methods this April. In the US, NIH has been at the forefront of initiatives to improve reproducibility. In Germany, Open Science is high on the agenda.
2.     Some institutions are coming on board. They react more slowly than funders, but where funders lead, they will follow. Some nice examples of institution-wide initiatives toward open, reproducible science come from the Montreal Neurological Institute and the Cambridge MRC Cognition and Brain Sciences Unit. In my own department, Experimental Psychology at the University of Oxford, our Head of Department has encouraged me to hold a one-day workshop on reproducibility later this year, saying she wants our department to be at the forefront of improving psychological science.

3.     Some of the best arguments for working reproducibly have been made by Florian Markowetz. You can read about them on this blog, see him give a very entertaining talk on the topic here, or read the published paper here. So there is no escape. I won't repeat his arguments here, as he makes them better than I could, but his basic point is that you don't need to do reproducible research for ideological reasons: there are many selfish arguments for adopting this approach – in the long run it makes your life very much easier.


4.     One point Florian doesn't cover is pre-registration of studies. The idea of a 'registered report', where your paper is evaluated, and potentially accepted for publication, on basis of introduction and methods was introduced with the goal of improving science by removing publication bias, p-hacking and HARKing (hypothesising after results are known). You can read about it in these slides by Chris Chambers. But when I tried this with a graduate student, Hannah Hobson, I realised there were other huge benefits. Many people worry that pre-registration slows you down. It does at the planning stage, but you more than compensate for that by the time saved once you have completed the study. Plus you get reviewer comments at a point in the research process when they are actually useful – i.e. before you have embarked on data collection. See this blogpost for my personal experience of this.

5.     Another advantage of registered reports is that publication does not depend on getting a positive result. This starts to look very appealing to the hapless early career researcher who keeps running experiments that don't 'work'. Some people imagine that this means the literature will become full of boring registered reports with null findings that nobody is interested in. But because that would be a danger, journals who offer registered reports impose a high bar on papers they accept – basically, the usual requirement is that the study is powered at 90%, so that we can be reasonably confident that a negative result is really a null finding, and not just a type II error. But if you are willing to put in the work to do a well-powered study, and the protocol passes scrutiny of reviewers, you are virtually guaranteed a publication.

6.     If you don't have time or inclination to go the whole hog with a registered report, there are still advantages to pre-registering a study, i.e. depositing a detailed, time-stamped protocol in a public archive. You still get the benefits of establishing priority of an idea, as well as avoiding publication bias, p-hacking, etc. And you can even benefit financially: the Open Science Framework is running a pre-registration challenge – they are giving $1000 to the first 1000 entrants who succeed in publishing a pre-registered study in a peer-reviewed journal.

7.     The final advantage of adopting reproducible and open science practices is that it is good for science. Florian Markowetz does not dwell long on the argument that it is 'the right thing to do', because he can see that it has as much appeal as being told to give up drinking and stop eating Dunkin Donuts for the sake of your health. He wants to dispel the idea that those who embrace reproducibility are some kind of altruistic idealists who are prepared to sacrifice their careers to improve science. Given arguments 1-6, he is quite right. You don't need to be idealistic to be motivated to adopt reproducible practices. But it is nice when one's selfish ambitions can be aligned with the good of the field. Indeed, I'd go further and suggest that I've long suspected that this may relate to the growing rates of mental health problems among graduate students and postdocs: many people who go into science start out with high ideals, but are made to feel they have to choose between doing things properly vs. succeeding by cutting corners, over-hyping findings, or telling fairy tales in grant proposals. The reproducibility agenda provides a way of continuing to do science without feeling bad about yourself.

Brent and Matt are right that we have a problem with the current generation of established academic psychologists, who are either hostile to or unaware of the reproducibility agenda.  When I give talks on this topic, I get instant recognition of the issues by early career researchers in the audience, whereas older people can be less receptive. But what we are seeing here is 'survivor bias'. Those who are in jobs managed to succeed by sticking to the status quo, and so see no need for change. But the need for change is all too apparent to the early career researcher who has wasted two years of their life trying to build on a finding that turns out to be a type I error from an underpowered, p-hacked study. My advice to the latter is don't let yourself be scared by dire warnings of the perils of working reproducibly. Times really are changing and if you take heed now, you will be ahead of the curve.


6 comments:

  1. Hear hear! Later in the song:

    Come mothers and fathers
    Throughout the land
    And don't criticize
    What you can't understand
    Your sons and your daughters
    Are beyond your command
    Your old road is rapidly aging
    Please get out of the new one if you can't lend your hand
    Cause the times they are a-changing.

    Steve

    ReplyDelete
  2. I suggested we include 'data integrity' as a marked component of our final year students' dissertation module, but hit a brick wall. No-one saw the need - "they have to give us their data on a USB stick, isn't that enough?". Part of the problem is that most established academics don't know how to make their data or analyses open, so cannot train their students to do so. Next year my dissertees will be putting data and R scripts onto github. One dissertation supervisor at a time...

    ReplyDelete
    Replies
    1. While using Git is a huge step in the right direction, it isn't really indicative of the data integrity being maintained, Unless the entire data analysis process was scrupulously managed in a git repository, not just the 'final' datasets and script.

      Granted, your move to require that from dissertees will allow this conversation to start, and will help in the creation of a generation of data-integrity aware scientists, from within the less computationally literate population.

      Kudos to you and yours !

      Delete
  3. "... sticking to the status quo ..." nails a large part of the problem. I think this also applies to undergraduate teaching: at least in my experience, 'Methods' and 'Data Analysis' courses are seldom about methods or data analysis, but instead show students how to enter numbers into SPSS and scan the output. Why? Because it's easy, and we've always done it that way ...

    Geoff

    ReplyDelete
  4. Daniel Feuerriegel3 May 2017 at 01:48

    Oddly enough, there is a great cover version of that Bob Dylan song with lyrics about reproducible science:

    https://www.youtube.com/watch?v=2ze-_jA1X94&feature=youtu.be

    Or search for "The statistics, they are a changin' "

    ReplyDelete
  5. Good talk by Markowetz.

    I'm a firm believer in using Latex (well actually Lyx) with knitr and R for writing a paper.

    That way one does not forget to change a p-value or n-size if the data changes slightly.


    ReplyDelete