Monday 5 March 2012

Time for neuroimaging (and PNAS) to clean up its act


There are rumblings in the jungle of neuroscience. There’s been a recent spate of high-profile papers that have drawn attention to methodological shortcomings in neuroimaging studies (e.g., Ioannidis, 2011; Kriegeskorte et al., 2009; Nieuwenhuis et al, 2011) . This is in response to published papers that regularly flout methodological standards that have been established for years. I’ve recently been reviewing the literature on brain imaging in relation to intervention for language impairments and came across this example.
Temple et al (2003) published an fMRI study of 20 children with dyslexia who were scanned both before and after a computerised intervention (FastForword) designed to improve their language. The article in question was published in the Proceedings of the National Academy of Sciences, and at the time of writing has had 270 citations. I did a spot check of fifty of those citing articles to see if any had noted problems with the paper: only one of them did so. The others repeated the authors’ conclusions, namely:

1. The training improved oral language and reading performance.
2. After training, children with dyslexia showed increased activity in multiple brain areas.
3. Brain activation in left temporo-parietal cortex and left inferior frontal gyrus became more similar to that of normal-reading children.
4. There was a correlation between increased activation in left temporo-parietal cortex and improvement in oral language ability.
But are these conclusions valid? I'd argue not, because:
  • There was no dyslexic control group. See this blogpost for why this matters. The language test scores of the treated children improved from pre-test to post-test, but where properly controlled trials have been done, equivalent change has been found in untreated controls (Strong et al., 2011). Conclusion 1 is not valid.
  • The authors presented uncorrected whole brain activation data. This is not explicitly stated but can be deduced from the z-scores and p-values. Russell Poldrack, who happens to be one of the authors of this paper, has written eloquently on this subject: “…it is critical to employ accurate corrections for multiple tests, since a large number of voxels will generally be significant by chance if uncorrected statistics are used. .. The problem of multiple comparisons is well known but unfortunately many journals still allow publication of results based on uncorrected whole-brain statistics.” Conclusion 2 is based on uncorrected p-values and is not valid.
  • To demonstrate that changes in activation for dyslexics made them more like typical children, one would need to demonstrate an interaction between group (dyslexic vs typical) and testing time (pre-training vs post-training). Although a small group of typically-reading children was tested on two occasions, this analysis was not done. Conclusion 3 is based on images of group activations rather than statistical comparisons that take into account within-group variance. It not valid.
  • There was no a priori specification of which language measures were primary outcomes, and numerous correlations with brain activation were computed, with no correction for multiple comparisons. The one correlation that the authors focus on (Figure reproduced below) is (a) only significant on a one-tailed test at .05 level; (b) driven by two outliers (encircled), both of whom had a substantial reduction in left temporo-parietal activation associated with a lack of language improvement. Conclusion 4 is not valid. Incidentally, the mean activation change (Y-axis) in this scatterplot is also not significantly different from zero. I'm not sure what this means, as it’s hard to interpret the “effect size” scale, which is described as “the weighted sum of parameter estimates from the multiple regression for rhyme vs. match contrast pre- and post-training.”
Figure 2 from Temple et al. (2003). Data from dyslexic children.

How is it that this paper has been so influential? I suggest that it is largely because of the image below, summarising results from the study. This was reproduced in a review paper by the senior author that appeared in Science in 2009. This has already had 42 citations. The image is so compelling that it’s also been used in promotional material for a commercial training program other than the one that was used in the study. As McCabe and Castel (2008) have noted, a picture of a brain seems to make people suspend normal judgement.

I don’t like to single out a specific paper for criticism in this way, but feel impelled to do so because the methodological problems were so numerous and so basic. For what it’s worth, every paper I have looked at in this area has had at least some of the same failings. However, in the case of Temple et al (2003) the problem is compounded by the declared interests of two of the authors, Merzenich and Tallal, who co-founded the firm that markets the FastForword intervention. One would have expected a journal editor to subject a paper to particularly stringent scrutiny under these circumstances.
We can also ask why those who read and cite this paper haven’t noted the problems. One reason is that neuroimaging papers are complicated and the methods can be difficult to understand if you don’t work in the area.
Is there a solution? One suggestion is that reviewers and readers would benefit from a simple cribsheet listing the main things to look for in a methods section of a paper in this area. Is there an imaging expert out there who could write such a document, targeted at those like me, who work in this broad area, but aren’t imaging experts? Maybe it already exists, but I couldn’t find anything like that on the web.
Imaging studies are expensive and time-consuming to do, especially when they involve clinical child groups. I'm not one of those who thinks they aren't ever worth doing. If an intervention is effective, imaging may help throw light on its mechanism of action. However, I do not think it is worthwhile to do poorly-designed studies of small numbers of participants to test the mode of action of an intervention that has not been shown to be effective in properly-controlled trials. It would make more sense to spend the research funds on properly controlled trials that would allow us to evaluate which interventions actually work.

Gabrieli, J. D. (2009). Dyslexia: a new synergy between education and cognitive neuroscience. Science, 325(5938), 280-283.
Ioannidis, J. P. A. (2011). Excess significance bias in the literature on brain volume abnormalities. Arch Gen Psychiatry, 68(8), 773-780. doi: 10.1001/archgenpsychiatry.2011.28
Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. F., & Baker, C. I. (2009). Circular analysis in systems neuroscience: the dangers of double dipping. [10.1038/nn.2303]. Nature Neuroscience, 12(5), 535-540. doi: 

McCabe, D., & Castel, A. (2008). Seeing is believing: The effect of brain images on judgments of scientific reasoning Cognition, 107 (1), 343-352 DOI: 10.1016/j.cognition.2007.07.017 

Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E.-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. [10.1038/nn.2886]. Nature Neuroscience, 14(9), 1105-1107.

Poldrack, R. A., & Mumford, J. A. (2009). Independence in ROI analysis: where is the voodoo? Social Cognitive and Affective Neuroscience, 4(2), 208-213.

Strong, G. K., Torgerson, C. J., Torgerson, D., & Hulme, C. (2010). A systematic meta-analytic review of evidence for the effectiveness of the ‘Fast ForWord’ language intervention program. Journal of Child Psychology and Psychiatry, in press, doi: 10.1111/j.1469-7610.2010.02329.x.

Temple, E., Deutsch, G. K., Poldrack, R. A., Miller, S. L., Tallal, P., Merzenich, M. M., & Gabrieli, J. D. E. (2003). Neural deficits in children with dyslexia ameliorated by behavioral remediation: Evidence from functional MRI. Proceedings of the National Academy of Sciences of the United States of America, 100(5), 2860-2865. doi: 10.1073/pnas.0030098100


  1. agree entirely

  2. I think the best thing that can be said here is that we (certainly myself, and also the field in general) have learned a lot since 2003 about how to do fMRI studies. I can't quibble with any of your criticisms, and I hope that they will help deflate some of the more inflated claims that have been made based on these results.

    1. As a neuroimaging researcher myself, I am impressed with Dr. Poldrack's response; to be measured and honest about criticism is difficult.

  3. Cognitive psychologists have for a long time been expressing worries about the use of brain imaging to learn more about cognition (e.g. van Orden & Paap, Philos Sci 1997; Bub, Cog. Neurospych., 2000; Harley, Cog. Neuropsych. 2004; Coltheart. Cog. Neurospych., 2004; Page, Cortex, 2006).

    Re a cribsheet listing the main things to look for in a paper in this area: something approaching that has been provided in Coltheart, M. (2010). What is functional neuroimaging for? In Hanson, S.J. & Bunzl, M. (Eds), Foundational Issues of Human Brain Mapping. Cambridge: MIT Press. This is not about technicalities of brain imaging, but it is about whether and if so under what circumstances brain imaging data can legitimately yield conclusions about cognition. The chapter by Poldrack in the same book also helps to clarify these issues.

    1. Max - While important, I think that the issues raised by Dorothy are somewhat orthogonal to the interpretive issues that you and I have written about in the past. Some of them arise from the issues of variable selection that were highlighted by Vul, Kriegeskorte, Ioannnidis, and others; similar issues have cropped up in other domains (e.g., genetics) where the data are very high-dimensional. Others relate to the standard practice (hopefully now waning) of reporting uncorrected statistical maps (I highlighted this in my recent paper in Neuroimage:

    2. Russ - I suspect Max is drumming up interest for the current March issue of the Australian Journal of Psychology on cognitive modelling vs cognitive neuroscience (nb: vested interest at work here on my part) :

      Dorothy - I agree the issue here is a broader one. Neuroimaging will be most informative when it is applied to established (reliable) behavioural effects with adequate theoretical motivation. However, my reading of recent reviews by Snowling & Hulme and Duff & Clarke lead me to conclude that the failure to include dyslexic control groups etc in is not limited to studies using neuroimaging as a DV. For my own part, I'm already weary of the plethora of special issues on alpha thresholding, non-independence errors, connectivity and the like in neuroimaging journals, so I don't perceive the need for yet another article on what to look for in a methods section. My heart leaps every time I read a well-designed and executed study. However, to paraphrase C. J. Dennis, I won't assume the academic world is a penitentiary, and worse still, that I am the warder.

  4. Dorothy. Good post. The paper was in PNAS and was "contributed" by one of the authors. See for what this means.

  5. Hi Dorothy,

    Great post - and I think/hope that things have been getting better since 2003. As for a cribsheet, Russ Poldrack et al's paper on guidelines for reporting an fMRI study is a good place to start:

    There is a similar one for voxel-based morphometry from the UCL group:


  6. Many thanks to Russ for being so frank about the problems of the past.
    And it’s good to have the sources for checking methods; thanks Steve.
    Re PNAS, there is a big problem with their 2-track approach to publishing.
    It has been argued that “The alternative publication tracks that PNAS provides seem to do a good job in giving NAS members more autonomy and letting them publish really groundbreaking, highly-cited, high-impact work while letting some lower quality work get in" (see ). But in this case “lower quality” means work where the conclusions don’t follow from the data. The fact that this study was published in PNAS is, I’m sure, one reason why it is taken seriously. It is depressing to see its conclusions propagated through the scientific literature as established fact. And where, as in this case, there is conflict of interest, a PNAS paper can in effect be used to promote a product and make profits for the authors.

    1. And where, as in this case, there is conflict of interest, a PNAS paper can in effect be used to promote a product and make profits for the authors.
      This is the most astonishing thing about this paper to me. While I'm no fan of fMRI, at least the field has indeed been working hard on it's problems. But this clear conflict of interest seems deeply problematic.

  7. Good post but I think we need to do more to distinguish between methodological flaws specific to imaging, and more general statistical problems.

    For example, in your bullet points, the first one (inadaquate control group) is nothing to do with imaging.

    The second is a special case of the problem of multiple comparisons (although granted it takes on some special characteristics in the case of neuroimaging data.)

    There are lots of imaging-specific methodological problems and it's important to remember that these are separate, and arguably worse, because statistical problems are well understood & there's a consensus on what to do about them, but we just don't know what e.g. head movement or physiological artifacts are doing to our images...

  8. Thanks Neuroskeptic. You're right to draw the distinction of course, but part of my argument is exactly this: when confronted with pictures of brains,researchers seem to forget anything they might ever have learned about statistics - and are allowed to get away with making really basic errors.
    As you say "statistical problems are well understood & there's a consensus on what to do about them" - but in this field they don't do it!
    The Temple paper is particularly bad, given the high profile of its authors and the journal, but I have this week looked at five other papers where imaging was used to look at pre- post-treatment effects, and only one had adequate controls. Also, as Nieuwenhuis et al found, erroneous analysis of interactions seems to be the norm in this field - yet most psychologists learn about this in Psychology 101. So what is going on here?

    1. Dear Dorothy,

      While there is little doubt that the neuroimaging field is riddled with statistical abuses, this does not mean that the rest of psychological science is somehow immune to such errors. Indeed, one of the reasons why neuroimaging has such a poor track record may well be that much of it is carried out by psychologists who frequently have little more than a tenuous grasp of statistical theory (and I say this from personal experience having worked as an academic in several well renowned psychology departments). Despite (or perhaps because of) Psychology 101, the majority of academics in this field rely on rote pressing of buttons in software like SPSS without having any real understanding of (or interest in) the underlying statistical analysis, and indeed this results in widespread abuses of techniques in this field (evidenced, for example, by the frequent misuse of structural equation modelling to infer causality, or the widespread treatment of subjective rating scale data as being quantitative). What is going on here, I think, is that the fields of psychology and cognitive science have not traditionally attracted people with strong quantitative skills or interest in mathematics, even though perhaps more than in any other field such skills are essential for doing good science. (In fairness, much the same argument can be made for biological sciences and medicine, so this is not unique to psychology.) This might be changing, as more and more mathematicians and physicists move into the field, which will hopefully raise standards (even though such developments tend not to be welcomed by traditionalists in the field). Ultimately, good science requires deep understanding of the methods used, and that includes statistics.

    2. I agree with deevybee. From my experience in reading and publishing, it seems to me that images have a particular quality that transcends the statistics, such that a compelling image (especially of a brain) is more convincing than say, a bar graph. There's actually data to this effect too:

      MCCABE, D., CASTEL, A. (2008). Seeing is believing: The effect of brain images on judgments of scientific reasoning. Cognition, 107(1), 343-352. DOI: 10.1016/j.cognition.2007.07.017

      Note: the effect is obtained from undergraduates and may not apply to those with PhDs, although I would be surprised if it doesn't.

  9. If it explains nothing, and merely tries to note correlations, it's worthless as science anyway.

  10. deevybee: Are neuroimaging studies actually more subject to those problems though?

    Maybe they are, maybe neuroimaging studies are less likely to e.g. have a sensible control group, but we need some numbers. Comparing neuroimaging papers to studies in the same field, of similar size, that didn't use imaging. I'd not be surprised by that, but on the other hand I'd need some proper data to convince me of it :)

  11. This comment has been removed by the author.

  12. This comment has been removed by the author.

  13. Sorry -- thought better of posting my comment about conflicts of interest. Thanks for the post, Deevybee. And, I really admire Russ's candor in his comment and that he has taken on a number of these sorts of issues in the time since that paper.

  14. The problem of spatial localization is also quite significant as is discussed in this meta-analysis

    Check out figure four, the periaqueductal gray is all over the place.

  15. Autism researchers who have published papers using fMRI themselves are questioning the efficacy of fMRI, noting that even when the head is completly motionless during an fMRI study the background noise itself still produces false positives and spurious data as discussed here:

    Favorite qoute from a neuroscientist who has published numerous studies in autism using fMRI:

    “It really, really, really sucks. My favorite result of the last five years is an artifact,” says lead investigator Steve Petersen, professor of cognitive neuroscience at Washington University in St. Louis.

    1. RAJ

      How does one know when a subject is completely motionless?

    2. The following study was a subject that had joined the choir invisible.

  16. Poor statistical analysis is only a small part of the problem. A huge part of the problem is systematic error, most notably from subject motion, that by numerous mechanisms may introduce spatial-temporal correlations that are not due to the BOLD effect. Even when spatial-temporal correlations are not the focus of a study motion adds lots of noise and the means by which motion is corrected is shoddy to say the least. Worse yet the means by which motion correction and temporal interpolation is presently done may be adding systematic errors that give bogus results for even simple block paradigm designs.

    There is value in fMRI but that wont matter soon if the field doesn't start getting serious about the science and foregoe the impulse to publish something sexy rather than something rigorous.

    And if you think fMRI is bad take note of the world of MEG current density imaging where wishful thinking substitutes for measurement. And error estimation? What's that?

  17. Major parts of the processing of fMRI data are unjustified.

    (1) The assumption of rigid image-volume motion in the ubiquitous motion-correction algorithms is not justified simply because motion frequently occurs on the time scale associated with collecting a single slice of a image volume. This will produce shifts of the slices relative to neighboring slices and violate the rigid image-volume assumption.

    (2) The temporal interpolation which is often used to temporally align the slice data within a given volume can not be separated from any effective motion correction algorithm yet that is exactly what is done in most fMRI data processing. This is sure to have systematic effects.

    (3) Even if motion correction were to work perfectly there would still be motion related errors due to a number of other mechanisms (perturbation of steady state by through slice motion, the image contrast generated by the sum-of-squares image reconstruction in ubiquitous multi-channel head Rx arrays, etc).

    Can it all be cleaned up? Possibly. But surely not if the community turns a blind eye to these problems or worse yet pretends they don't exist.

  18. This is certainly one of the most valuable posts. Great tips from beginning till end. Lots of suggestions for me and for people. Superb work
    commercial cleaning brisbane


  19. Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.

    Function Point Estimation Training

  20. Dear Dorothy,
    thanks for that great contribution. You ask "Is there a solution?". Well, we have a solution: Amazing fMRI plots for everybody (not just neuroscientists)! Every researcher should have the possibility to exploit the cognitive biases of their readers.


  21. one other thing with the Figure 2 from Temple... and generally, is the inappropriate use of total scores

  22. Interesting article; I don't work in neuroscience, but I've recently become wary of PNAS papers, at least pre-July 2010 when the journal changed its submission rules.

    Prior to then, papers could either be submitted directly to the journal who would administer peer review, or be contributed via an NAS member, who would take care of the peer-review process themselves. So if you had a pal who was an NAS member, they could perhaps be a bit selective with the reviews to help it through.

    Now, I must say that I have no idea whether Temple et al. 2003 came through that route, and would not imply any such situation in that case. But in my own field, I've seen PNAS papers containing errors, which would perhaps have had a hard time being published elsewhere, appearing via the "Communicated submission" route.

    Eventually, there was sufficient scandal about this process - summarised here in Nature News - that PNAS removed it as a route to publication in July 2010. The fact that NAS members could contribute "communicated submissions" from authors associated with their own institutions or departments was always a bit dodgy!

    So for PNAS papers submitted before Jul 2010: caveat lector.

  23. Hello! Does the rate of updating your blog depend on something or you compose blog articles when you have an inspiration or you create in case you have time? Can't wait to hear from you.

  24. brain cleaner.nice post thanks for your post.

  25. Very Good information on property dealing. This site has very useful inputs related to Real Estate. Well Done & Keep it up to the team of Property Bytes….

    Function Point Estimation Training in Chennai