BishopBlog: Why we need pre-registration

Friday, 26 July 2013

Why we need pre-registration

There has been a chorus of disapproval this week at the suggestion that researchers should 'pre-register' their studies with journals and spell out in advance the methods and analyses that they plan to do. Those who wish to follow the debate should look at this critique by Sophie Scott, with associated comments, and the responses to it collated here by Pete Etchells. They should also read the explanation of the pre-registration proposals and FAQ by Chris Chambers - something that many participants in the debate appear not to have done.

Quite simply, pre-registration is designed to tackle two problems in scientific publishing:

Bias against publication of null results
A failure to distinguish hypothesis-generating (exploratory) from hypothesis-testing analyses

Either of these alone is bad for science: the combined effect of both of them is catastrophic, and has led to a situation where research is failing to do its job in terms of providing credible answers to scientific questions.

Null results

Let's start with the bias against null results. Much has been written about this, including by me. But the heavy guns in the argument have been wielded by Ben Goldacre, who has pointed out that, in the clinical trials field, if we only see the positive findings, then we get a completely distorted view of what works, and as a result, people may die. In my field of psychology, the stakes are not normally as high, but the fact remains that there can be massive distortion in our perception of evidence.

Pre-registration would fix this by guaranteeing publication of a paper regardless of how the results turn out. In fact, there is another, less bureaucratic, way the null result problem could be fixed, and that would be by having reviewers decide on a paper's publishability solely on the basis of the introduction and methods. But that would not fix the second problem.

Blurring the boundaries between exploratory and hypothesis-testing analyses

A big problem is that nearly all data analysis is presented as if it is hypothesis-testing when in fact much of it is exploratory.

In an exploratory analysis, you take a dataset and look at it flexibly to see what's there. Like many scientists, I love exploratory analyses, because you don't know what you will find, and it can be important and exciting. I suspect it is also something that you get better at as you get more experienced, and more able to see the possibilities in the numbers. But my love of exploratory analyses is coupled with a nervousness. With an exploratory analysis, whatever you find, you can never be sure it wasn't just a chance result. Perhaps I was lucky in having this brought home to me early in my career, when I had an alphabetically ordered list of stroke patients I was planning to study, and I happened to notice that those with names in the first half of the alphabet had left hemisphere lesions and those with names in the second half had right hemisphere lesions. I even did a chi square test and found it was highly significant. Clearly this was nonsense, and just one of those spurious things that can turn up by chance.

These days it is easy to see how often meaningless 'significant' results occur by running analyses on simulated data - see this blogpost for instance. In my view, all statistics classes should include such exercises.

So you've done your exploratory analysis, got an exciting finding, but are nervous as to whether it is real. What do you do? The answer is you need a confirmatory study. In the field of genetics, failure to realise this led to several years of stasis, cogently described by Flint et al (2010). Genetics really highlights the problem, because of the huge numbers of possible analyses that can be conducted. What was quickly learned was that most exciting effects don't replicate. The bar has accordingly been set much higher, and most genetics journals won't consider publishing a genetic association unless replication has been demonstrated (Munafo & Flint, 2011). This is tough, but it has meant that we can now place confidence in genetics results. (It also has had a positive side-effect of encouraging more collaboration between research groups). Unfortunately, those outside the field of genetics are unaware of these developments, and we are seeing increasing numbers of genetic association studies being published in the neuroscience literature, with tiny samples and no replication.

The important point to grasp is that the meaning of a p-value is completely different if it emerges when testing an a priori prediction, compared with when it is found in the course of conducting numerous analyses of a dataset. Here, for instance, are outputs from 15 runs of a 4-way Anova on random data, as described here:

Each row shows p-value for outputs (main effects then interactions) for one run of 4-way Anova on new set of random data. For a slightly more legible version see here

If I approached a dataset specifically testing the hypothesis that there would be an interaction between group and task, then the chance of a p-value of .05 or less would be 1 in 20 (as can be confirmed by repeating the simulation thousands of times - in a small number of runs it's less easy to see). But if I just looked for significant findings, it's not hard to find something on most of these runs. An exploratory analysis is not without value, but its value is in generating hypotheses that can then be tested in an a priori design.

So replication is needed to deal with the uncertainties around exploratory analysis. How does pre-registration fit in the picture? Quite simply, it makes explicit the distinction between hypothesis-generating (exploratory) and hypothesis-testing research, which is currently completely blurred. As in the example above, if you tell me in advance what hypothesis you are testing, then I can place confidence in the uncorrected statistical probabilities associated with the predicted effects. If you haven't predicted anything in advance, then I can't.

This doesn't mean that the results from exploratory analyses are necessarily uninteresting, untrue, or unpublishable, but it does mean we should interpret them as what they are: hypothesis-generating rather than hypothesis-testing.

I'm not surprised at the outcry against pre-registration. This is mega. It would require most of us to change our behaviour radically. It would turn on its head the criteria used to evaluate findings: well-conducted replication studies, currently often unpublishable, would be seen as important, regardless of their results. On the other hand, it would no longer be possible to report exploratory analyses as if they are hypothesis-testing. In my view, unless we do this we will continue to waste time and precious research funding chasing illusory truths.

References

Flint, J., Greenspan, R. J., & Kendler, K. S. (2010). How Genes Influence Behavior: Oxford University press.

Munafo, M, & Flint, J. (2011). Dissecting the genetic architecture of human personality Trends in Cognitive Sciences, 15 (9), 395-400 DOI: 10.1016/j.tics.2011.07.007

38 comments:

Anonymous26 July 2013 at 11:16
"In my view, unless we do this we will continue to waste time and precious research funding chasing illusory truths."

Hear, hear !!
ReplyDelete
Replies
Stephen Senn26 July 2013 at 11:33
A problem that has to be addressed, however, is that of analysis and review. In the context of drug regulation a sponsor has to have the statistical analysis plan finalised before unblinding of the data. Whatever other analyses are subsequently presented this pre-specified one will be provided to the regulator who will pay almost all attention to this and almost no attention to anything else. On the basis of a series of such pre-specified analyses of a set of trials the claim is either accepted or not. Of course, if the regulator decides an analysis was silly, despite being pre-specified, the regulator may also reject the claim. Very, very rarely would the regulator decide that a claim that would fail on the basis of a pre-specified stupid analysis but would succeed on the basis of a sensible revised one should be accepted.

However, if we look at the journal review process we see that there is a problem. We are currently being told that all trials should be published. Now we also want pre-registration. So what does this imply about acceptance of a paper for publication? Consider the following cases.
1. Sensible pre-specification sensibly reported.
2. Sensible pre-specification deviation in reporting.
3. Stupid pre-specification (for example technically incorrect statistical procedure) faithfully reported.
4. Stupid pre-specification corrected in manuscript.

Now what is the purpose of peer-review and when is it supposed to happen? Presumably in the world of pre-registration, peer-reviewers should approve category 1 papers and require that category 2 papers be amended to become category 1 papers. But what about category 3 papers? Is peer-review supposed to change them into category 4? Or is per-review supposed to ensure (analogously to changing category 2 into 1) that 4 should be turned into 3?

The problem is that the regulatory process of claim accepted versus claim rejected in which pre-sepcification plays an important role is a different one from manuscript accepted versus manuscript rejected.

The former is one of deciding whether a claim should be accepted as proven or not. The latter is one of deciding whether an argument is sound or not. A sound argument may, of course, support the conclusion that a treatment is not effective.

I am not against pre-registration but I think there are many details to work out and one of the implications may be that the work of statistical (and other) reviewers will have to increase. There will have to be detailed pre-experimental initiation review of the protocol and subsequent manuscript review to check that the protocol has been adhered to.
ReplyDelete
Replies
Alison Cummins26 July 2013 at 14:53
The comics version of this article...
http://xkcd.com/882/
ReplyDelete
Replies
Anonymous26 July 2013 at 18:33
The opponents of pre-registration often appear to be overlooking the fact that well defined and well conducted replication/confirmatory experiments are required if science is to be self-correcting. The current scientific culture (journals, funding sources, education) focuses on and rewards exploratory studies, and inadequately addresses the fact that science must not stop there if it is to be an effective method to obtain truth.

Pre-registration of studies is an important and fundamental part of doing confirmatory experiments. Other sciences that are considering study registration can learn from medical research, which has the most experience with study registration. Drug research is clearly divided into exploratory (phase 2) and confirmatory (phase 3) studies. A major purpose of registration is to specify which analyses are confirmatory and which are exploratory. This is simply good experimental methodology and does not stifle scientific creativity or exploration. The regulatory process for drugs recognizes that good confirmatory studies are essential for convincing scientific evidence and requires such studies, including pre-registration.

However, linking study registration with a requirement for power analysis for exploratory studies, publication in a specific journal, and public distribution of data adds controversial baggage to the process. This baggage goes beyond what is done in medical research and will make acceptance and use of study registration difficult. Processes for study registration are needed that provide the benefits of basic registration of confirmatory experiments, and allow the other more controversial issues to be handled separately.

Jim Kennedy
ReplyDelete
Replies
Anonymous26 July 2013 at 19:38
It is not clear to me that the distinction between "exploratory" and "confirmatory" investigations helps. If you explore 100 possible effects and find about 5 of them getting below p=0.05, you didn't find anything worth trying to confirm. There is no reason to think that any of those results are something worth confirming if they wouldn't pass muster as a confirmatory experiment.

The only thing that is different in an exploratory investigation is that the experimenter isn't able to say in advance what is going to be tested, so you can't do anything so precise as a Bonferroni correction. There is more wiggle room, in other words. This makes it both easier to fudge, and harder to take seriously.
ReplyDelete
Replies
Unknown27 July 2013 at 09:26
Great post Dorothy.
Exploration and pre-registration are not opponents; pre-registration is an ally to exploration.
ReplyDelete
Replies
Greig de Zubicaray28 July 2013 at 13:21
Dorothy,

"Genetics really highlights the problem..." but the example doesn't at all support your conclusion that pre-registration was/is the solution. Neuroimaging genetics has the ENIGMA consortium, in which I and my colleagues participate, so replication samples and large-scale data sharing are being utilised in neuroscience too. This culture shift was enacted without pre-registration.

You've blogged before about the dubious natures of research quality assessment exercises and the studies reported in high impact factor journals, so perhaps it would be worth extending the debate about pre-registration as one potential solution to questionable research practices to another potential solution: holding journals - and editors - directly accountable for the findings they publish. We need a metric that assesses journals' quality according to the ratio of positive vs null findings they publish and the number of replication studies they publish. If the former is too high, the science in the journal is questionable. If the latter is too low, the journal editorial policy does not reflect adherence to good science. Aside from increasing the rate of replication studies, a journal quality metric of this sort could prove quite useful for reducing questionable research practices and reduce the reliance on impact factors, but would require endorsement from the field before it could be used in assessment exercises or by funding agencies and promotion panels.
ReplyDelete
Replies
Josh28 July 2013 at 16:16
My concerns about pre-registration are very simple. The goal is to improve replicability. But what is being measured and promoted is pre-registration. To the extent that the pre-registration cost-function departs from the replication cost-function, you end up in the perverse situation where demanding pre-registration *lowers* replicability.

The simplest example is that if you do your studies online (which more and more of us do), it actually takes more time/effort to pre-register your study than replicate it. Given the limited number of hours/day...

Another example: suppose you analyze a dataset one way (as pre-registered), and then a reviewer correctly points out that there is a better way to analyze it, which changes the results. If you judge manuscript quality based on pre-registration, the author would be better off sending the paper to a different journal hoping for new reviewers than analyzing the data correctly!

Pre-registration will do little to solve the problem of researchers running dozens of different experiments and only publishing the one that "worked" -- a documented problem.

The Chambers FAQ mostly discusses the plan to have journals review the methods and not the results. This might make sense in fields that still do one experiment/paper (which is itself a problem: psych journals requiring multiple experiments/journal was one of the original replicability reforms!), but I usually have around 10, each of which is contingent on the previous ones. Does the paper get reviewed 10 times? How many years will that take?

Most importantly, the focus on pre-registration pulls interest & energy away from efforts to deal with replicability straight-on. As I've pointed out elsewhere (http://bit.ly/14ouwCf), if we don't track replicability, we'd have no way of knowing whether any given reform (like pre-registration) had any effect.
ReplyDelete
Replies
Anonymous28 July 2013 at 20:57
* "My concerns about pre-registration are very simple. The goal is to improve replicability. But what is being measured and promoted is pre-registration."

&

"Most importantly, the focus on pre-registration pulls interest & energy away from efforts to deal with replicability straight-on."

If I am not mistaken, pre-registration (as in the Cortex model) helps with HARKing, file-drawer problem, p-hacking, and high power of studies.

If I am not mistaken, 3 of those 4 issues can possibly be tied to replicability of findings (http://www.psychologie.hu-berlin.de/prof/per/pdf/2013/Replicability_target_Peer_commentary.pdf).

If that is correct, then maybe you could state that pre-registration (following the Cortex model) would possibly help with replicability issues.

* "Another example: suppose you analyze a dataset one way (as pre-registered), and then a reviewer correctly points out that there is a better way to analyze it, which changes the results."

I think the reviewer in the Cortex-model pre-registration would/could point this out as well, in stage 1 (see http://cdn.elsevier.com/promis_misc/PROMISpub_idt_Guidelines_cortex_RR_17_04_2013.pdf)

* "Pre-registration will do little to solve the problem of researchers running dozens of different experiments and only publishing the one that "worked" -- a documented problem."

If I am not mistaken, because the Cortex-model pre-registration accepts a study before the results are known, this would help with not only reporting studies that "worked".

I think a scientist could maybe run dozens of studies but that will cost resources (perhaps especially with many participants). I don't think they will pre-register all those test-studies. I think they would only invest their resources in a study which they would have some confidence in, e.g. regarding finding a significant effect, or regarding the importance of the findings (be they sign. or non-sign. as for instance in therapy-evaluating research or something like that), and these could be the studies they could then pre-register.

* "if we don't track replicability, we'd have no way of knowing whether any given reform (like pre-registration) had any effect."

I think it would be interesting indeed to compare pre-registerd studies replicability to non-pre-registered studies.
ReplyDelete
Replies
Greig de Zubicaray29 July 2013 at 02:27
@Chris Chambers - your claim that pre-registration will 'virtually guarantee' publication needs to be examined. Editors retain the right to determine whether an article is suitable for submission to the journal. Let me provide a concrete example from the journal Cortex of how this can be a problem. Volume 48, Issue 7 of that journal is a Special Issue on “Language and the Motor System”: http://tinyurl.com/lqm4kya

Yet, the entire issue is composed of articles written by proponents of language embodiment. Not one article from the alternative perspective. Is there something wrong with the editorial culture at Cortex?
ReplyDelete
Replies
Anonymous29 July 2013 at 15:52
If studies are pre-registered in a somewhat private way with a particular journal or on private pages on OpenScienceFramwork, the value for solving the file-drawer problem is reduced. One important goal of the registries for medical research is to provide public information about studies that are being done and have been done—and in a way that minimizes the burden to experimenters, while providing basic methodological benefits. The most widely used medical registry currently has over 149,000 registered studies and is often the starting point when someone wants to find research about a particular topic. It is at http://www.clinicaltrials.gov/ct2/home

Note that medical journals increasingly require as a condition for publication that studies were registered at a “public” registry. The statement of the International Committee of Medical Journal Editors can be found at http://www.icmje.org/publishing_10register.html

Information on the history of study registration in medical research can be found at http://www.clinicaltrials.gov/ct2/about-site/history

Optimal practice with most flexibility would be to have different registries, with some emphasizing making information publicly available. A particular study could be registered on different registries.
ReplyDelete
Replies

Add comment

New comments are not allowed.

BishopBlog

Friday, 26 July 2013

Why we need pre-registration

Null results

Blurring the boundaries between exploratory and hypothesis-testing analyses

References

38 comments:

Search This Blog

Prizewinning blog

Popular Posts

Blog Archive

Contributors

Followers

BishopBlog

Friday, 26 July 2013

Why we need pre-registration

Null results

Blurring the boundaries between exploratory and hypothesis-testing analyses

References

38 comments:

Search This Blog

Subscribe To

Prizewinning blog

Popular Posts

Blog Archive

Contributors

Followers