I had one colleague tell me that sharing data/scripts is
"too high a bar" and that I am wrong for insisting all students who
work w me do it
And Brent agreed:
We were recently told that teaching our students to
pre-register, do power analysis, and replicate was "undermining"
careers.
Now, as a co-author of a manifesto for
reproducible science, this kind of thing makes me pretty cross, and so I
weighed in, demanding to know who was issuing such rubbish advice. Brent
patiently explained that most of his colleagues take this view and are
skeptics, agnostics or just naïve about the need to tackle reproducibility. I
said that was just shafting the next generation, but Brent replied:
Not as long as the incentive structure remains the same. In these conditions they are helping their students.
So things have got to the point where I need more than 140
characters to make my case. I should stress that I recognise that Brent is one
of the good guys, who is trying to make a difference. But I think he is way too
pessimistic about the rate of progress, and far from 'helping' their students, the
people who resist change are badly damaging them. So here are my reasons.
1.
The incentive structure really is changing. The main drivers are
funders, who are alarmed that they might be spending their precious funds on
results that are not solid. In the UK, funders (Wellcome Trust and Research
Councils) were behind a high profile symposium on Reproducibility, and subsequently
have issued statements on the topic and started working to change policies and
to ensure their panel members are aware of the issues. One council, the BBSRC,
funded an Advanced Workshop on Reproducible Methods this April. In the US, NIH
has been at the forefront of initiatives to improve reproducibility. In Germany, Open Science is high on the agenda.
2.
Some institutions are coming on board. They react
more slowly than funders, but where funders lead, they will follow. Some nice
examples of institution-wide initiatives toward open, reproducible science come
from the Montreal Neurological Institute and the Cambridge MRC Cognition and Brain Sciences Unit. In my own department, Experimental Psychology at the
University of Oxford, our Head of Department has encouraged me to hold a
one-day workshop on reproducibility later this year, saying she wants our
department to be at the forefront of improving psychological science.
3.
Some of the best arguments for working
reproducibly have been made by Florian Markowetz. You can read about them on this
blog, see him give a very entertaining talk on the topic here,
or read the published paper here.
So there is no escape. I won't repeat his arguments here, as he makes them
better than I could, but his basic point is that you don't need to do
reproducible research for ideological reasons: there are many selfish arguments
for adopting this approach – in the long run it makes your life very much
easier.
4.
One point Florian doesn't cover is
pre-registration of studies. The idea of a 'registered report', where your
paper is evaluated, and potentially accepted for publication, on basis of
introduction and methods was introduced with the goal of improving science by
removing publication bias, p-hacking and HARKing (hypothesising after results
are known). You can read about it in these slides by Chris Chambers. But when I
tried this with a graduate student, Hannah Hobson, I realised there were other
huge benefits. Many people worry that pre-registration slows you down. It does at
the planning stage, but you more than compensate for that by the time saved
once you have completed the study. Plus you get reviewer comments at a point in
the research process when they are actually useful – i.e. before you have embarked
on data collection. See this blogpost
for my personal experience of this.
5.
Another advantage of registered reports is that
publication does not depend on getting a positive result. This starts to look
very appealing to the hapless early career researcher who keeps running
experiments that don't 'work'. Some people imagine that this means the
literature will become full of boring registered reports with null findings
that nobody is interested in. But because that would be a danger, journals who
offer registered reports impose a high bar on papers they accept – basically,
the usual requirement is that the study is powered at 90%, so that we can be reasonably
confident that a negative result is really a null finding, and not just a type
II error. But if you are willing to put in the work to do a well-powered study,
and the protocol passes scrutiny of reviewers, you are virtually guaranteed a
publication.
6.
If you don't have time or inclination to go the
whole hog with a registered report, there are still advantages to
pre-registering a study, i.e. depositing a detailed, time-stamped protocol in a
public archive. You still get the benefits of establishing priority of an idea,
as well as avoiding publication bias, p-hacking, etc. And you can even benefit
financially: the Open Science Framework is running a pre-registration challenge – they are giving
$1000 to the first 1000 entrants who succeed in publishing a pre-registered
study in a peer-reviewed journal.
7.
The final advantage of adopting reproducible and
open science practices is that it is good for science. Florian Markowetz does
not dwell long on the argument that it is 'the right thing to do', because he
can see that it has as much appeal as being told to give up drinking and stop eating
Dunkin Donuts for the sake of your health. He wants to dispel the idea that
those who embrace reproducibility are some kind of altruistic idealists who are
prepared to sacrifice their careers to improve science. Given arguments 1-6, he
is quite right. You don't need to be idealistic to be motivated to adopt reproducible
practices. But it is nice when one's selfish ambitions can be aligned with the
good of the field. Indeed, I'd go further and suggest that I've long suspected that
this may relate to the growing
rates of mental health problems among graduate students and postdocs: many
people who go into science start out with high ideals, but are made to feel
they have to choose between doing things properly vs. succeeding by cutting
corners, over-hyping findings, or telling fairy tales in grant proposals. The
reproducibility agenda provides a way of continuing to do science without
feeling bad about yourself.
Brent and Matt are right that we have a problem with the
current generation of established academic psychologists, who are either
hostile to or unaware of the reproducibility agenda. When I give talks on this topic, I get
instant recognition of the issues by early career researchers in the audience,
whereas older people can be less receptive. But what we are seeing here is
'survivor bias'. Those who are in jobs managed to succeed by sticking to the
status quo, and so see no need for change. But the need for change is all too
apparent to the early career researcher who has wasted two years of their life
trying to build on a finding that turns out to be a type I error from an
underpowered, p-hacked study. My advice to the latter is don't let yourself be
scared by dire warnings of the perils of working reproducibly. Times really are
changing and if you take heed now, you will be ahead of the curve.
Hear hear! Later in the song:
ReplyDeleteCome mothers and fathers
Throughout the land
And don't criticize
What you can't understand
Your sons and your daughters
Are beyond your command
Your old road is rapidly aging
Please get out of the new one if you can't lend your hand
Cause the times they are a-changing.
Steve
I suggested we include 'data integrity' as a marked component of our final year students' dissertation module, but hit a brick wall. No-one saw the need - "they have to give us their data on a USB stick, isn't that enough?". Part of the problem is that most established academics don't know how to make their data or analyses open, so cannot train their students to do so. Next year my dissertees will be putting data and R scripts onto github. One dissertation supervisor at a time...
ReplyDeleteWhile using Git is a huge step in the right direction, it isn't really indicative of the data integrity being maintained, Unless the entire data analysis process was scrupulously managed in a git repository, not just the 'final' datasets and script.
DeleteGranted, your move to require that from dissertees will allow this conversation to start, and will help in the creation of a generation of data-integrity aware scientists, from within the less computationally literate population.
Kudos to you and yours !
"... sticking to the status quo ..." nails a large part of the problem. I think this also applies to undergraduate teaching: at least in my experience, 'Methods' and 'Data Analysis' courses are seldom about methods or data analysis, but instead show students how to enter numbers into SPSS and scan the output. Why? Because it's easy, and we've always done it that way ...
ReplyDeleteGeoff
Oddly enough, there is a great cover version of that Bob Dylan song with lyrics about reproducible science:
ReplyDeletehttps://www.youtube.com/watch?v=2ze-_jA1X94&feature=youtu.be
Or search for "The statistics, they are a changin' "
Good talk by Markowetz.
ReplyDeleteI'm a firm believer in using Latex (well actually Lyx) with knitr and R for writing a paper.
That way one does not forget to change a p-value or n-size if the data changes slightly.