This morning I did a very mean thing. I saw an author announce to the world on Twitter that they had just published this paper, and I tweeted a critical comment. This does not make me happy, as I know just how proud and pleased one feels when a research project at last makes it into print, and to immediately pounce on it seems unkind. Furthermore, the flaws in the paper are not all that unusual: they characterise a large swathe of literature. And the amount of work that has gone into the paper is clearly humongous, with detailed analysis of white matter structural integrity that probably represents many months of effort. But that, in a sense, is the problem. We just keep on and on doing marvellously complex neuroimaging in contexts where the published studies are likely to contain unreliable results.
Why am I so sure that this is unreliable? Well, yesterday saw the publication of a review that I had led on, which was highly relevant to the topic of the paper – genetic variants affecting brain and behaviour. In our review we closely scrutinised 30 papers on this topic that had been published in top neuroscience journals. The field of genetics was badly burnt a couple of decades ago when it was discovered that study after study reported results that failed to replicate. These days, it's not possible to publish a genetic association in a genetics journal unless you show that the finding holds up in a replication sample. However, neuroscience hasn't caught up and seems largely unaware of why this is a problem.
The focus of this latest paper was on a genetic variant known as the COMT Val158Met SNP. People can have one of three versions of this genotype: Val/Val, Val/Met and Met/Met, but it's not uncommon for researchers to just distinguish people with Val/Val from Met carriers (Val/Met and Met/Met). This COMT polymorphism is one of the most-studied genetic variants in relation to human cognition, with claims of associations with all kinds of things: intelligence, memory, executive functions, emotion, response to anti-depressants, to name just a few. Few of these, however, have replicated, and there is reason to be dubious about the robustness of findings (Barnett, Scoriels & Munafo, 2008)
In this latest COMT paper – and many, many other papers in neurogenetics – the sample size is simply inadequate. There were 19 participants (12 males and 7 females) with the COMT Val/Val version of the variant, compared with 63 (27 males and 36 females) who had either Met/Met or Val/Met genotype. The authors reported that significant effects of genotype on corpus callosum structure were found in males only. As we noted in our review, effects of common genetic variants are typically very small. In this context, an effect size (standardized difference between means of two genotypes, Cohen's d) of .2 would be really large. Yet this study has power of .08 to detect such an effect in males – that is if there really is a difference of -0.2 SDs between the two genotypes, and you repeatedly ran studies with this sample size, then you'd fail to see the effect in 92% of studies. To look at it another way, the true effect size would need to be enormous (around 1 SD difference between groups) to have an 80% chance of being detectable, given the sample size.
When confronted with this kind of argument, people often say that maybe there really are big effect sizes. After all, the researchers were measuring characteristics of the brain, which are nearer to the gene than the behavioural measures that are often used. Unfortunately, there is another much more likely explanation for the result, which is that it is a false positive arising from a flexible analytic pipeline.
The problem is that both neuroscience and genetics are a natural environment for analytic flexibility. Put the two together, and you need to be very very careful to control for spurious false positive results. In the papers we evaluated for our review, there were numerous sources of flexibility: often researchers adopted multiple comparisons corrections for some of these, but typically not for all. In the COMT/callosum paper, the authors addressed the multiple comparisons issue using permutation testing. However, one cannot tell from a published paper how many subgroupings/genetic variants/phenotypes/analysis pathways etc were tried but not reported. If, as in mainstream genetics, the authors had included a direct replication of this result, that would be far more convincing. Perhaps the best way for the field to proceed would be by adopting pre-registration as standard. Pre-registration means you commit yourself to a specific hypothesis and analytic plan in advance; hypotheses can then be meaningfully tested using standard statistical methods. If you don’t pre-register and there are many potential ways of looking at the data, it is very easy to fool yourself into finding something that looks 'significant'.
I am sufficiently confident that this finding will not replicate that I hereby undertake to award a prize of £1000 to anyone who does a publicly preregistered replication of the El-Hage et al paper and reproduces their finding of a statistically significant male-specific effect of COMT Val158Met polymorphism on the same aspects of corpus callosum structure.
I emphasise that, though the new COMT/callosum paper is the impetus for this blogpost, I do not intend this as a specific criticism of the authors of that paper. The research approach they adopted is pretty much standard in the field, and the literature is full of small studies that aren't pre-registered and don't include a replication sample. I don't think most researchers are being deliberately misleading, but I do think we need a change of practices if we are to amass a research literature that can be built upon. Either pre-registration or replication should be conditions of publication.
PS. 3rd October 2017
An anonymous commentator (below) drew my attention to a highly relevant preprint in Bioarxiv by Jahanshad and colleagues from the ENIGMA-DTI consortium, entitled 'Do Candidate Genes Affect the Brain's White Matter Microstructure? Large-Scale Evaluation of 6,165 Diffusion MRI Scans'. They included COMT as one of the candidate genes, although they did not look at gender-specific effects. The Abstract makes for sobering reading: 'Regardless of the approach, the previously reported candidate SNPs did not show
significant associations with white matter microstructure in this largest genetic study of DTI to date; the
negative findings are likely not due to insufficient power.'
In addition, Kevin Mitchell (@WiringTheBrain) on Twitter alerted me to a blogpost from 2015 in which he made very similar points about neuroimaging biomarkers. Let's hope that funders and mainstream journals start to get the message.
I can't resist the plug: If you also take the Prereg Challenge with this study, you'll get an additional $1000 US (of course, we don't care how the chips fall... as long as the published study reports results of all preregistered analyses!) https://cos.io/prereg
ReplyDeleteThat's a great idea, and possible consolation prize to anyone who does the study but doesn't find the same result!
Deletehi, have you seen this? https://www.biorxiv.org/content/early/2017/02/20/107987
ReplyDeleteWow. v relevant.thanks!
Delete