There are rumblings in the jungle of neuroscience. There’s been a recent spate of high-profile papers that have drawn attention to methodological shortcomings in neuroimaging studies (e.g., Ioannidis, 2011; Kriegeskorte et al., 2009; Nieuwenhuis et al, 2011) . This is in response to published papers that regularly flout methodological standards that have been established for years. I’ve recently been reviewing the literature on brain imaging in relation to intervention for language impairments and came across this example.
Temple et al (2003) published an fMRI study of 20 children with dyslexia who were scanned both before and after a computerised intervention (FastForword) designed to improve their language. The article in question was published in the Proceedings of the National Academy of Sciences, and at the time of writing has had 270 citations. I did a spot check of fifty of those citing articles to see if any had noted problems with the paper: only one of them did so. The others repeated the authors’ conclusions, namely:
- There was no dyslexic control group. See this blogpost for why this matters. The language test scores of the treated children improved from pre-test to post-test, but where properly controlled trials have been done, equivalent change has been found in untreated controls (Strong et al., 2011). Conclusion 1 is not valid.
- The authors presented uncorrected whole brain activation data. This is not explicitly stated but can be deduced from the z-scores and p-values. Russell Poldrack, who happens to be one of the authors of this paper, has written eloquently on this subject: “…it is critical to employ accurate corrections for multiple tests, since a large number of voxels will generally be significant by chance if uncorrected statistics are used. .. The problem of multiple comparisons is well known but unfortunately many journals still allow publication of results based on uncorrected whole-brain statistics.” Conclusion 2 is based on uncorrected p-values and is not valid.
- To demonstrate that changes in activation for dyslexics made them more like typical children, one would need to demonstrate an interaction between group (dyslexic vs typical) and testing time (pre-training vs post-training). Although a small group of typically-reading children was tested on two occasions, this analysis was not done. Conclusion 3 is based on images of group activations rather than statistical comparisons that take into account within-group variance. It not valid.
- There was no a priori specification of which language measures were primary outcomes, and numerous correlations with brain activation were computed, with no correction for multiple comparisons. The one correlation that the authors focus on (Figure reproduced below) is (a) only significant on a one-tailed test at .05 level; (b) driven by two outliers (encircled), both of whom had a substantial reduction in left temporo-parietal activation associated with a lack of language improvement. Conclusion 4 is not valid. Incidentally, the mean activation change (Y-axis) in this scatterplot is also not significantly different from zero. I'm not sure what this means, as it’s hard to interpret the “effect size” scale, which is described as “the weighted sum of parameter estimates from the multiple regression for rhyme vs. match contrast pre- and post-training.”
|Figure 2 from Temple et al. (2003). Data from dyslexic children||.|
Gabrieli, J. D. (2009). Dyslexia: a new synergy between education and cognitive neuroscience. Science, 325(5938), 280-283.
McCabe, D., & Castel, A. (2008). Seeing is believing: The effect of brain images on judgments of scientific reasoning Cognition, 107 (1), 343-352 DOI: 10.1016/j.cognition.2007.07.017
Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E.-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. [10.1038/nn.2886]. Nature Neuroscience, 14(9), 1105-1107.
Poldrack, R. A., & Mumford, J. A. (2009). Independence in ROI analysis: where is the voodoo? Social Cognitive and Affective Neuroscience, 4(2), 208-213.
Strong, G. K., Torgerson, C. J., Torgerson, D., & Hulme, C. (2010). A systematic meta-analytic review of evidence for the effectiveness of the ‘Fast ForWord’ language intervention program. Journal of Child Psychology and Psychiatry, in press, doi: 10.1111/j.1469-7610.2010.02329.x.
Temple, E., Deutsch, G. K., Poldrack, R. A., Miller, S. L., Tallal, P., Merzenich, M. M., & Gabrieli, J. D. E. (2003). Neural deficits in children with dyslexia ameliorated by behavioral remediation: Evidence from functional MRI. Proceedings of the National Academy of Sciences of the United States of America, 100(5), 2860-2865. doi: 10.1073/pnas.0030098100