Tuesday, 17 October 2017
Citing the research literature: the distorting lens of memory
Poor Billy would have been long forgotten, were it not for the fact that he died suddenly shortly after he had undergone extensive assessment for his specific learning difficulties. An autopsy found that death was due to a brain haemorrhage caused by an angioma in the cerebellum, but the neuropathologist also remarked on some unusual features elsewhere in his brain:
"In the cerebral hemispheres, anomalies were noted in the convolutional pattern of the parietal lobe bilaterally. The cortical pattern was disrupted by penetrating deep gyri that appeared disconnected. Related areas of the corpus callosum appeared thin (Figure 2). Microscopic examination revealed the cause of the hemorrrage to be a cerebellar angioma of the type known as capillary telangiectases (Figure 3). The cerebral cortex was more massive than normal, the lamination tended to be columnar, the nerve cells were spindle-shaped, and there were numerous ectopic neurons in the white matter that were not collected into distinct heterotopias (Figure 4)." p. 496*
I had tracked down this article in the course of writing a paper with colleagues on the neuronal migration account of dyslexia – a topic I have blogged about previously The 'ectopic neurons' referred to by Drake are essentially misplaced neurons that, because of disruptions of very early development, have failed to migrate to their usual location in the brain.
I realised that my hazy memory of this paper was quite different from the reality: I had thought the location of the ectopic neurons was consistent with those reported in later post mortem studies by Galaburda and colleagues. In fact, Drake says nothing about their location, other than that it is in white matter – which contrasts with the later reports.
This made me curious to see how this work had been reported by others. This was not a comprehensive exercise: I did this by identifying from Web of Science all papers that cited Drake's article, and then checking what they said about the results if I could locate an online version of the article easily. Here's what I found:
Out of a total of 45 papers, 18 were excluded: they were behind a paywall or not readily traceable online, or (1 case) did not mention neuroanatomical findings A further 10 papers included the Drake study in a bunch of references referring to neuroanatomical abnormalities in dyslexia, without singling out any specific results. Thus they were not inaccurate, but just vague.
The remaining 17 could be divided up as follows:
Seven papers gave a broadly accurate account of the neuroanatomical findings. The most detailed accurate account was by Galaburda et al (1985) who noted:
"Drake published neuropathological findings in a well-documented case of developmental dyslexia. He described a thinned corpus callosum particularly involving the parietal connections, abnormal cortical folding in the parietal regions, and, on microscopical examination, excessive numbers of neurons in the subcortical white matter. The illustrations provided did not show the parietal lobe, and the portion of the corpus callosum that could be seen appeared normal. No mention was made as to whether the anomalies were asymmetrically distributed."p. 227.
Four (three of them from the same research group) cited Drake as though there were two patients, rather than one, and focussed only on the the corpus callosum, without mentioning ectopias.
Six gave an inaccurate account of the findings. The commonest error was to be specific about the location of the ectopias, which (as is clear from the Galaburda quote above), was not apparent in the text or figures of the original paper. Five of these articles located the ectopias in the left parietal lobe, one more generally in the parietal lobe, and one in the cerebellum (where the patient's stroke had been).
So, if we discount those available articles that just gave a rather general reference to Drake's study, over half of the remainder got some information wrong – and the bias was in the direction of making this early study consistent with later research.
The paper is hard to get hold of**, and when you do track it down, it is rather long-winded. It is largely concerned with the psychological evaluation of the patient, including aspects, such as Oedipal conflicts, that seem fanciful to modern eyes, and the organisation of material is not easy to follow. Perhaps it is not so surprising that people make errors when reporting the findings. But if nothing else, this exercise reminded me of the need to check sources when you cite them. It is all too easy to think you know what is in a paper – or to rely on someone else's summary. In fact, these days I am often dismayed to discover I have a false memory of what is in my own old papers, let alone those by other people. But once in the literature, errors can propagate, and we need to be vigilant to prevent a gradual process of distortion over time. It is all too easy to hurriedly read a secondary source or an abstract: we (and I include myself here) need to slow down.
References
Drake, W. E. (1968). Clinical and pathological findings in a child with a developmental learning disability Journal of Learning Disabilities, 1(9), 486-502.
Galaburda, A. M., Sherman, G. F., Rosen, G. D., Aboitiz, F., & Geschwind, N. (1985). Developmental dyslexia: four consecutive cases with cortical anomalies. Annals of Neurology, 18, 222-233.
* I assume the figures are copyrighted so am not reproducing them here
**I thank Michelle Dawson for pointing out that the article can be downloaded from this site: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.949.4021&rep=rep1&type=pdf
Sunday, 1 October 2017
Pre-registration or replication: the need for new standards in neurogenetic studies
This morning I did a very mean thing. I saw an author announce to the world on Twitter that they had just published this paper, and I tweeted a critical comment. This does not make me happy, as I know just how proud and pleased one feels when a research project at last makes it into print, and to immediately pounce on it seems unkind. Furthermore, the flaws in the paper are not all that unusual: they characterise a large swathe of literature. And the amount of work that has gone into the paper is clearly humongous, with detailed analysis of white matter structural integrity that probably represents many months of effort. But that, in a sense, is the problem. We just keep on and on doing marvellously complex neuroimaging in contexts where the published studies are likely to contain unreliable results.
Why am I so sure that this is unreliable? Well, yesterday saw the publication of a review that I had led on, which was highly relevant to the topic of the paper – genetic variants affecting brain and behaviour. In our review we closely scrutinised 30 papers on this topic that had been published in top neuroscience journals. The field of genetics was badly burnt a couple of decades ago when it was discovered that study after study reported results that failed to replicate. These days, it's not possible to publish a genetic association in a genetics journal unless you show that the finding holds up in a replication sample. However, neuroscience hasn't caught up and seems largely unaware of why this is a problem.
The focus of this latest paper was on a genetic variant known as the COMT Val158Met SNP. People can have one of three versions of this genotype: Val/Val, Val/Met and Met/Met, but it's not uncommon for researchers to just distinguish people with Val/Val from Met carriers (Val/Met and Met/Met). This COMT polymorphism is one of the most-studied genetic variants in relation to human cognition, with claims of associations with all kinds of things: intelligence, memory, executive functions, emotion, response to anti-depressants, to name just a few. Few of these, however, have replicated, and there is reason to be dubious about the robustness of findings (Barnett, Scoriels & Munafo, 2008)
In this latest COMT paper – and many, many other papers in neurogenetics – the sample size is simply inadequate. There were 19 participants (12 males and 7 females) with the COMT Val/Val version of the variant, compared with 63 (27 males and 36 females) who had either Met/Met or Val/Met genotype. The authors reported that significant effects of genotype on corpus callosum structure were found in males only. As we noted in our review, effects of common genetic variants are typically very small. In this context, an effect size (standardized difference between means of two genotypes, Cohen's d) of .2 would be really large. Yet this study has power of .08 to detect such an effect in males – that is if there really is a difference of -0.2 SDs between the two genotypes, and you repeatedly ran studies with this sample size, then you'd fail to see the effect in 92% of studies. To look at it another way, the true effect size would need to be enormous (around 1 SD difference between groups) to have an 80% chance of being detectable, given the sample size.
When confronted with this kind of argument, people often say that maybe there really are big effect sizes. After all, the researchers were measuring characteristics of the brain, which are nearer to the gene than the behavioural measures that are often used. Unfortunately, there is another much more likely explanation for the result, which is that it is a false positive arising from a flexible analytic pipeline.
The problem is that both neuroscience and genetics are a natural environment for analytic flexibility. Put the two together, and you need to be very very careful to control for spurious false positive results. In the papers we evaluated for our review, there were numerous sources of flexibility: often researchers adopted multiple comparisons corrections for some of these, but typically not for all. In the COMT/callosum paper, the authors addressed the multiple comparisons issue using permutation testing. However, one cannot tell from a published paper how many subgroupings/genetic variants/phenotypes/analysis pathways etc were tried but not reported. If, as in mainstream genetics, the authors had included a direct replication of this result, that would be far more convincing. Perhaps the best way for the field to proceed would be by adopting pre-registration as standard. Pre-registration means you commit yourself to a specific hypothesis and analytic plan in advance; hypotheses can then be meaningfully tested using standard statistical methods. If you don’t pre-register and there are many potential ways of looking at the data, it is very easy to fool yourself into finding something that looks 'significant'.
I am sufficiently confident that this finding will not replicate that I hereby undertake to award a prize of £1000 to anyone who does a publicly preregistered replication of the El-Hage et al paper and reproduces their finding of a statistically significant male-specific effect of COMT Val158Met polymorphism on the same aspects of corpus callosum structure.
I emphasise that, though the new COMT/callosum paper is the impetus for this blogpost, I do not intend this as a specific criticism of the authors of that paper. The research approach they adopted is pretty much standard in the field, and the literature is full of small studies that aren't pre-registered and don't include a replication sample. I don't think most researchers are being deliberately misleading, but I do think we need a change of practices if we are to amass a research literature that can be built upon. Either pre-registration or replication should be conditions of publication.
PS. 3rd October 2017
An anonymous commentator (below) drew my attention to a highly relevant preprint in Bioarxiv by Jahanshad and colleagues from the ENIGMA-DTI consortium, entitled 'Do Candidate Genes Affect the Brain's White Matter Microstructure? Large-Scale Evaluation of 6,165 Diffusion MRI Scans'. They included COMT as one of the candidate genes, although they did not look at gender-specific effects. The Abstract makes for sobering reading: 'Regardless of the approach, the previously reported candidate SNPs did not show significant associations with white matter microstructure in this largest genetic study of DTI to date; the negative findings are likely not due to insufficient power.'
In addition, Kevin Mitchell (@WiringTheBrain) on Twitter alerted me to a blogpost from 2015 in which he made very similar points about neuroimaging biomarkers. Let's hope that funders and mainstream journals start to get the message.
Why am I so sure that this is unreliable? Well, yesterday saw the publication of a review that I had led on, which was highly relevant to the topic of the paper – genetic variants affecting brain and behaviour. In our review we closely scrutinised 30 papers on this topic that had been published in top neuroscience journals. The field of genetics was badly burnt a couple of decades ago when it was discovered that study after study reported results that failed to replicate. These days, it's not possible to publish a genetic association in a genetics journal unless you show that the finding holds up in a replication sample. However, neuroscience hasn't caught up and seems largely unaware of why this is a problem.
The focus of this latest paper was on a genetic variant known as the COMT Val158Met SNP. People can have one of three versions of this genotype: Val/Val, Val/Met and Met/Met, but it's not uncommon for researchers to just distinguish people with Val/Val from Met carriers (Val/Met and Met/Met). This COMT polymorphism is one of the most-studied genetic variants in relation to human cognition, with claims of associations with all kinds of things: intelligence, memory, executive functions, emotion, response to anti-depressants, to name just a few. Few of these, however, have replicated, and there is reason to be dubious about the robustness of findings (Barnett, Scoriels & Munafo, 2008)
In this latest COMT paper – and many, many other papers in neurogenetics – the sample size is simply inadequate. There were 19 participants (12 males and 7 females) with the COMT Val/Val version of the variant, compared with 63 (27 males and 36 females) who had either Met/Met or Val/Met genotype. The authors reported that significant effects of genotype on corpus callosum structure were found in males only. As we noted in our review, effects of common genetic variants are typically very small. In this context, an effect size (standardized difference between means of two genotypes, Cohen's d) of .2 would be really large. Yet this study has power of .08 to detect such an effect in males – that is if there really is a difference of -0.2 SDs between the two genotypes, and you repeatedly ran studies with this sample size, then you'd fail to see the effect in 92% of studies. To look at it another way, the true effect size would need to be enormous (around 1 SD difference between groups) to have an 80% chance of being detectable, given the sample size.
When confronted with this kind of argument, people often say that maybe there really are big effect sizes. After all, the researchers were measuring characteristics of the brain, which are nearer to the gene than the behavioural measures that are often used. Unfortunately, there is another much more likely explanation for the result, which is that it is a false positive arising from a flexible analytic pipeline.
The problem is that both neuroscience and genetics are a natural environment for analytic flexibility. Put the two together, and you need to be very very careful to control for spurious false positive results. In the papers we evaluated for our review, there were numerous sources of flexibility: often researchers adopted multiple comparisons corrections for some of these, but typically not for all. In the COMT/callosum paper, the authors addressed the multiple comparisons issue using permutation testing. However, one cannot tell from a published paper how many subgroupings/genetic variants/phenotypes/analysis pathways etc were tried but not reported. If, as in mainstream genetics, the authors had included a direct replication of this result, that would be far more convincing. Perhaps the best way for the field to proceed would be by adopting pre-registration as standard. Pre-registration means you commit yourself to a specific hypothesis and analytic plan in advance; hypotheses can then be meaningfully tested using standard statistical methods. If you don’t pre-register and there are many potential ways of looking at the data, it is very easy to fool yourself into finding something that looks 'significant'.
I am sufficiently confident that this finding will not replicate that I hereby undertake to award a prize of £1000 to anyone who does a publicly preregistered replication of the El-Hage et al paper and reproduces their finding of a statistically significant male-specific effect of COMT Val158Met polymorphism on the same aspects of corpus callosum structure.
I emphasise that, though the new COMT/callosum paper is the impetus for this blogpost, I do not intend this as a specific criticism of the authors of that paper. The research approach they adopted is pretty much standard in the field, and the literature is full of small studies that aren't pre-registered and don't include a replication sample. I don't think most researchers are being deliberately misleading, but I do think we need a change of practices if we are to amass a research literature that can be built upon. Either pre-registration or replication should be conditions of publication.
PS. 3rd October 2017
An anonymous commentator (below) drew my attention to a highly relevant preprint in Bioarxiv by Jahanshad and colleagues from the ENIGMA-DTI consortium, entitled 'Do Candidate Genes Affect the Brain's White Matter Microstructure? Large-Scale Evaluation of 6,165 Diffusion MRI Scans'. They included COMT as one of the candidate genes, although they did not look at gender-specific effects. The Abstract makes for sobering reading: 'Regardless of the approach, the previously reported candidate SNPs did not show significant associations with white matter microstructure in this largest genetic study of DTI to date; the negative findings are likely not due to insufficient power.'
In addition, Kevin Mitchell (@WiringTheBrain) on Twitter alerted me to a blogpost from 2015 in which he made very similar points about neuroimaging biomarkers. Let's hope that funders and mainstream journals start to get the message.
Subscribe to:
Posts (Atom)