BishopBlog: Time for neuroimaging (and PNAS) to clean up its act

Monday, 5 March 2012

Time for neuroimaging (and PNAS) to clean up its act

There are rumblings in the jungle of neuroscience. There’s been a recent spate of high-profile papers that have drawn attention to methodological shortcomings in neuroimaging studies (e.g., Ioannidis, 2011; Kriegeskorte et al., 2009; Nieuwenhuis et al, 2011) . This is in response to published papers that regularly flout methodological standards that have been established for years. I’ve recently been reviewing the literature on brain imaging in relation to intervention for language impairments and came across this example.
Temple et al (2003) published an fMRI study of 20 children with dyslexia who were scanned both before and after a computerised intervention (FastForword) designed to improve their language. The article in question was published in the Proceedings of the National Academy of Sciences, and at the time of writing has had 270 citations. I did a spot check of fifty of those citing articles to see if any had noted problems with the paper: only one of them did so. The others repeated the authors’ conclusions, namely:

1. The training improved oral language and reading performance.

2. After training, children with dyslexia showed increased activity in multiple brain areas.

3. Brain activation in left temporo-parietal cortex and left inferior frontal gyrus became more similar to that of normal-reading children.

4. There was a correlation between increased activation in left temporo-parietal cortex and improvement in oral language ability.

But are these conclusions valid? I'd argue not, because:

There was no dyslexic control group. See this blogpost for why this matters. The language test scores of the treated children improved from pre-test to post-test, but where properly controlled trials have been done, equivalent change has been found in untreated controls (Strong et al., 2011). Conclusion 1 is not valid.

The authors presented uncorrected whole brain activation data. This is not explicitly stated but can be deduced from the z-scores and p-values. Russell Poldrack, who happens to be one of the authors of this paper, has written eloquently on this subject: “…it is critical to employ accurate corrections for multiple tests, since a large number of voxels will generally be significant by chance if uncorrected statistics are used. .. The problem of multiple comparisons is well known but unfortunately many journals still allow publication of results based on uncorrected whole-brain statistics.” Conclusion 2 is based on uncorrected p-values and is not valid.

To demonstrate that changes in activation for dyslexics made them more like typical children, one would need to demonstrate an interaction between group (dyslexic vs typical) and testing time (pre-training vs post-training). Although a small group of typically-reading children was tested on two occasions, this analysis was not done. Conclusion 3 is based on images of group activations rather than statistical comparisons that take into account within-group variance. It not valid.

There was no a priori specification of which language measures were primary outcomes, and numerous correlations with brain activation were computed, with no correction for multiple comparisons. The one correlation that the authors focus on (Figure reproduced below) is (a) only significant on a one-tailed test at .05 level; (b) driven by two outliers (encircled), both of whom had a substantial reduction in left temporo-parietal activation associated with a lack of language improvement. Conclusion 4 is not valid. Incidentally, the mean activation change (Y-axis) in this scatterplot is also not significantly different from zero. I'm not sure what this means, as it’s hard to interpret the “effect size” scale, which is described as “the weighted sum of parameter estimates from the multiple regression for rhyme vs. match contrast pre- and post-training.”


Figure 2 from Temple et al. (2003). Data from dyslexic children	.

How is it that this paper has been so influential? I suggest that it is largely because of the image below, summarising results from the study. This was reproduced in a review paper by the senior author that appeared in Science in 2009. This has already had 42 citations. The image is so compelling that it’s also been used in promotional material for a commercial training program other than the one that was used in the study. As McCabe and Castel (2008) have noted, a picture of a brain seems to make people suspend normal judgement.

I don’t like to single out a specific paper for criticism in this way, but feel impelled to do so because the methodological problems were so numerous and so basic. For what it’s worth, every paper I have looked at in this area has had at least some of the same failings. However, in the case of Temple et al (2003) the problem is compounded by the declared interests of two of the authors, Merzenich and Tallal, who co-founded the firm that markets the FastForword intervention. One would have expected a journal editor to subject a paper to particularly stringent scrutiny under these circumstances.

We can also ask why those who read and cite this paper haven’t noted the problems. One reason is that neuroimaging papers are complicated and the methods can be difficult to understand if you don’t work in the area.

Is there a solution? One suggestion is that reviewers and readers would benefit from a simple cribsheet listing the main things to look for in a methods section of a paper in this area. Is there an imaging expert out there who could write such a document, targeted at those like me, who work in this broad area, but aren’t imaging experts? Maybe it already exists, but I couldn’t find anything like that on the web.

Imaging studies are expensive and time-consuming to do, especially when they involve clinical child groups. I'm not one of those who thinks they aren't ever worth doing. If an intervention is effective, imaging may help throw light on its mechanism of action. However, I do not think it is worthwhile to do poorly-designed studies of small numbers of participants to test the mode of action of an intervention that has not been shown to be effective in properly-controlled trials. It would make more sense to spend the research funds on properly controlled trials that would allow us to evaluate which interventions actually work.

References
Gabrieli, J. D. (2009). Dyslexia: a new synergy between education and cognitive neuroscience. Science, 325(5938), 280-283.

Ioannidis, J. P. A. (2011). Excess significance bias in the literature on brain volume abnormalities. Arch Gen Psychiatry, 68(8), 773-780. doi: 10.1001/archgenpsychiatry.2011.28

Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. F., & Baker, C. I. (2009). Circular analysis in systems neuroscience: the dangers of double dipping. [10.1038/nn.2303]. Nature Neuroscience, 12(5), 535-540. doi: http://www.nature.com/neuro/journal/v12/n5/suppinfo/nn.2303_S1.html

McCabe, D., & Castel, A. (2008). Seeing is believing: The effect of brain images on judgments of scientific reasoning Cognition, 107 (1), 343-352 DOI: 10.1016/j.cognition.2007.07.017

Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E.-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. [10.1038/nn.2886]. Nature Neuroscience, 14(9), 1105-1107.

Poldrack, R. A., & Mumford, J. A. (2009). Independence in ROI analysis: where is the voodoo? Social Cognitive and Affective Neuroscience, 4(2), 208-213.

Strong, G. K., Torgerson, C. J., Torgerson, D., & Hulme, C. (2010). A systematic meta-analytic review of evidence for the effectiveness of the ‘Fast ForWord’ language intervention program. Journal of Child Psychology and Psychiatry, in press, doi: 10.1111/j.1469-7610.2010.02329.x.

Temple, E., Deutsch, G. K., Poldrack, R. A., Miller, S. L., Tallal, P., Merzenich, M. M., & Gabrieli, J. D. E. (2003). Neural deficits in children with dyslexia ameliorated by behavioral remediation: Evidence from functional MRI. Proceedings of the National Academy of Sciences of the United States of America, 100(5), 2860-2865. doi: 10.1073/pnas.0030098100

33 comments:

Anonymous5 March 2012 at 19:14
agree entirely
ReplyDelete
Replies
Russ Poldrack5 March 2012 at 19:26
I think the best thing that can be said here is that we (certainly myself, and also the field in general) have learned a lot since 2003 about how to do fMRI studies. I can't quibble with any of your criticisms, and I hope that they will help deflate some of the more inflated claims that have been made based on these results.
ReplyDelete
Replies
Max Coltheart5 March 2012 at 21:56
Cognitive psychologists have for a long time been expressing worries about the use of brain imaging to learn more about cognition (e.g. van Orden & Paap, Philos Sci 1997; Bub, Cog. Neurospych., 2000; Harley, Cog. Neuropsych. 2004; Coltheart. Cog. Neurospych., 2004; Page, Cortex, 2006).

Re a cribsheet listing the main things to look for in a paper in this area: something approaching that has been provided in Coltheart, M. (2010). What is functional neuroimaging for? In Hanson, S.J. & Bunzl, M. (Eds), Foundational Issues of Human Brain Mapping. Cambridge: MIT Press. This is not about technicalities of brain imaging, but it is about whether and if so under what circumstances brain imaging data can legitimately yield conclusions about cognition. The chapter by Poldrack in the same book also helps to clarify these issues.
ReplyDelete
Replies
Vincent Walsh5 March 2012 at 23:48
Dorothy. Good post. The paper was in PNAS and was "contributed" by one of the authors. See http://www.pnas.org/site/misc/iforc.shtml#editorial for what this means.
Best
Vin
ReplyDelete
Replies
Steve6 March 2012 at 02:13
Hi Dorothy,

Great post - and I think/hope that things have been getting better since 2003. As for a cribsheet, Russ Poldrack et al's paper on guidelines for reporting an fMRI study is a good place to start: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2287206/?tool=pubmed

There is a similar one for voxel-based morphometry from the UCL group: http://www.ncbi.nlm.nih.gov/pubmed/18314353

Best
Steve
ReplyDelete
Replies
deevybee6 March 2012 at 05:43
Many thanks to Russ for being so frank about the problems of the past.
And it’s good to have the sources for checking methods; thanks Steve.
Re PNAS, there is a big problem with their 2-track approach to publishing.
It has been argued that “The alternative publication tracks that PNAS provides seem to do a good job in giving NAS members more autonomy and letting them publish really groundbreaking, highly-cited, high-impact work while letting some lower quality work get in" (see http://classic.the-scientist.com/blog/display/56194/ ). But in this case “lower quality” means work where the conclusions don’t follow from the data. The fact that this study was published in PNAS is, I’m sure, one reason why it is taken seriously. It is depressing to see its conclusions propagated through the scientific literature as established fact. And where, as in this case, there is conflict of interest, a PNAS paper can in effect be used to promote a product and make profits for the authors.
ReplyDelete
Replies
Neuroskeptic6 March 2012 at 10:31
Good post but I think we need to do more to distinguish between methodological flaws specific to imaging, and more general statistical problems.

For example, in your bullet points, the first one (inadaquate control group) is nothing to do with imaging.

The second is a special case of the problem of multiple comparisons (although granted it takes on some special characteristics in the case of neuroimaging data.)

There are lots of imaging-specific methodological problems and it's important to remember that these are separate, and arguably worse, because statistical problems are well understood & there's a consensus on what to do about them, but we just don't know what e.g. head movement or physiological artifacts are doing to our images...
ReplyDelete
Replies
deevybee6 March 2012 at 10:42
Thanks Neuroskeptic. You're right to draw the distinction of course, but part of my argument is exactly this: when confronted with pictures of brains,researchers seem to forget anything they might ever have learned about statistics - and are allowed to get away with making really basic errors.
As you say "statistical problems are well understood & there's a consensus on what to do about them" - but in this field they don't do it!
The Temple paper is particularly bad, given the high profile of its authors and the journal, but I have this week looked at five other papers where imaging was used to look at pre- post-treatment effects, and only one had adequate controls. Also, as Nieuwenhuis et al found, erroneous analysis of interactions seems to be the norm in this field - yet most psychologists learn about this in Psychology 101. So what is going on here?
ReplyDelete
Replies
Anonymous6 March 2012 at 10:42
If it explains nothing, and merely tries to note correlations, it's worthless as science anyway.
ReplyDelete
Replies
Neuroskeptic6 March 2012 at 10:53
deevybee: Are neuroimaging studies actually more subject to those problems though?

Maybe they are, maybe neuroimaging studies are less likely to e.g. have a sensible control group, but we need some numbers. Comparing neuroimaging papers to studies in the same field, of similar size, that didn't use imaging. I'd not be surprised by that, but on the other hand I'd need some proper data to convince me of it :)
ReplyDelete
Replies
Daniel J. Simons6 March 2012 at 13:27
This comment has been removed by the author.
ReplyDelete
Replies
Daniel J. Simons6 March 2012 at 13:28
This comment has been removed by the author.
ReplyDelete
Replies
Daniel J. Simons6 March 2012 at 13:42
Sorry -- thought better of posting my comment about conflicts of interest. Thanks for the post, Deevybee. And, I really admire Russ's candor in his comment and that he has taken on a number of these sorts of issues in the time since that paper.
ReplyDelete
Replies
Anonymous9 March 2012 at 15:09
The problem of spatial localization is also quite significant as is discussed in this meta-analysis
http://www.sciencedirect.com/science/article/pii/S1053811911014005

Check out figure four, the periaqueductal gray is all over the place.
ReplyDelete
Replies
RAJ9 March 2012 at 19:49
Autism researchers who have published papers using fMRI themselves are questioning the efficacy of fMRI, noting that even when the head is completly motionless during an fMRI study the background noise itself still produces false positives and spurious data as discussed here:

http://sfari.org/news-and-opinion/news/2012/movement-during-brain-scans-may-lead-to-spurious-patterns

Favorite qoute from a neuroscientist who has published numerous studies in autism using fMRI:

“It really, really, really sucks. My favorite result of the last five years is an artifact,” says lead investigator Steve Petersen, professor of cognitive neuroscience at Washington University in St. Louis.
ReplyDelete
Replies
Anonymous9 March 2012 at 23:54
Poor statistical analysis is only a small part of the problem. A huge part of the problem is systematic error, most notably from subject motion, that by numerous mechanisms may introduce spatial-temporal correlations that are not due to the BOLD effect. Even when spatial-temporal correlations are not the focus of a study motion adds lots of noise and the means by which motion is corrected is shoddy to say the least. Worse yet the means by which motion correction and temporal interpolation is presently done may be adding systematic errors that give bogus results for even simple block paradigm designs.

There is value in fMRI but that wont matter soon if the field doesn't start getting serious about the science and foregoe the impulse to publish something sexy rather than something rigorous.

And if you think fMRI is bad take note of the world of MEG current density imaging where wishful thinking substitutes for measurement. And error estimation? What's that?
ReplyDelete
Replies
Anonymous14 March 2012 at 22:59
Major parts of the processing of fMRI data are unjustified.

(1) The assumption of rigid image-volume motion in the ubiquitous motion-correction algorithms is not justified simply because motion frequently occurs on the time scale associated with collecting a single slice of a image volume. This will produce shifts of the slices relative to neighboring slices and violate the rigid image-volume assumption.

(2) The temporal interpolation which is often used to temporally align the slice data within a given volume can not be separated from any effective motion correction algorithm yet that is exactly what is done in most fMRI data processing. This is sure to have systematic effects.

(3) Even if motion correction were to work perfectly there would still be motion related errors due to a number of other mechanisms (perturbation of steady state by through slice motion, the image contrast generated by the sum-of-squares image reconstruction in ubiquitous multi-channel head Rx arrays, etc).

Can it all be cleaned up? Possibly. But surely not if the community turns a blind eye to these problems or worse yet pretends they don't exist.
ReplyDelete
Replies
Manish Kumar22 July 2012 at 12:56
This is certainly one of the most valuable posts. Great tips from beginning till end. Lots of suggestions for me and for people. Superb work
commercial cleaning brisbane
ReplyDelete
Replies
Unknown21 August 2012 at 12:01

Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.

Function Point Estimation Training
ReplyDelete
Replies
Lex Brycenet6 September 2012 at 09:05
Dear Dorothy,
thanks for that great contribution. You ask "Is there a solution?". Well, we have a solution: Amazing fMRI plots for everybody (not just neuroscientists)! Every researcher should have the possibility to exploit the cognitive biases of their readers.

http://www.nicebread.de/amazing-fmri-plots-for-everybody/

Cheers,
Lex
ReplyDelete
Replies
Andrew Bateman24 November 2012 at 13:34
one other thing with the Figure 2 from Temple... and generally, is the inappropriate use of total scores http://www.medicaljournals.se/jrm/content/?doi=10.2340/16501977-0938&html=1

ReplyDelete
Replies
Anonymous19 January 2013 at 13:11
Interesting article; I don't work in neuroscience, but I've recently become wary of PNAS papers, at least pre-July 2010 when the journal changed its submission rules.

Prior to then, papers could either be submitted directly to the journal who would administer peer review, or be contributed via an NAS member, who would take care of the peer-review process themselves. So if you had a pal who was an NAS member, they could perhaps be a bit selective with the reviews to help it through.

Now, I must say that I have no idea whether Temple et al. 2003 came through that route, and would not imply any such situation in that case. But in my own field, I've seen PNAS papers containing errors, which would perhaps have had a hard time being published elsewhere, appearing via the "Communicated submission" route.

Eventually, there was sufficient scandal about this process - summarised here in Nature News http://www.nature.com/news/2009/091009/full/news.2009.985.html - that PNAS removed it as a route to publication in July 2010. The fact that NAS members could contribute "communicated submissions" from authors associated with their own institutions or departments was always a bit dodgy!

So for PNAS papers submitted before Jul 2010: caveat lector.
ReplyDelete
Replies
StardustShining Blog6 February 2013 at 13:16
Hello! Does the rate of updating your blog depend on something or you compose blog articles when you have an inspiration or you create in case you have time? Can't wait to hear from you.
ReplyDelete
Replies
cctv28 November 2013 at 07:27
brain cleaner.nice post thanks for your post.
ReplyDelete
Replies
Unknown7 January 2014 at 06:39
Very Good information on property dealing. This site has very useful inputs related to Real Estate. Well Done & Keep it up to the team of Property Bytes….

Function Point Estimation Training in Chennai
ReplyDelete
Replies

Add comment

New comments are not allowed.

BishopBlog

Monday, 5 March 2012

Time for neuroimaging (and PNAS) to clean up its act

33 comments:

Search This Blog

Prizewinning blog

Popular Posts

Blog Archive

Contributors

Followers

BishopBlog

Monday, 5 March 2012

Time for neuroimaging (and PNAS) to clean up its act

33 comments:

Search This Blog

Subscribe To

Prizewinning blog

Popular Posts

Blog Archive

Contributors

Followers