BishopBlog: March 2012

Wednesday, 28 March 2012

C’mon sisters! Speak out!

When I give a talk, I like to allow time for questions. It’s not just a matter of politeness to the audience, though that is a factor. I find it helps me gauge how the talk has gone down: what points have people picked up on, are there things they didn’t get, and are there things I didn’t get? Quite often a question coming from left field gives me good ideas. Sometimes I’m challenged and that’s good too, as it helps me either improve my arguments or revise them. But here’s the thing. After virtually every talk I give there’s a small queue of people who want to ask me a private question. Typically they’ll say, “I didn’t like to ask you this in the question period, but…”, or “This probably isn’t a very sensible thing to ask, but…”. And the thing I’ve noticed is that they are almost always women. And very often I find myself saying, “I wish you’d asked that question in public, because I think there are lots of people in the audience who’d have been interested in what you have to say.”

I’m not an expert in gender studies or feminism, and most of my information about research on gender differences comes from Virginia Valian’s scholarly review, Why So Slow. Valian reviews studies confirming that women are less likely than men to speak out in question sessions in seminars. I have to say my experience in the field of psychology is rather different, and I'm pleased to work in a department where women’s voices are as likely to be heard as men’s. But there’s no doubt that this is not the norm for many disciplines, and I've attended conferences, and given talks, where 90% of questions come from men, even when they are a minority of the audience.

So what’s the explanation? Valian recounts personal experiences as well as research evidence that women are at risk of being ignored if they attempt to speak out, and so they learn to keep quiet. But, while I'm sure there is truth in that, I find myself irritated by what I see as a kind of passivity in my fellow women. It seems too easy to lay the blame at the feet of nasty men who treat you as if you are invisible. A deeper problem seems to be that women have been socially conditioned to be nervous of putting their heads above the parapet. It is really much easier to sit quietly in an audience and think your private thoughts than to share those thoughts with the world, because the world may judge you and find you lacking. If you ask women why they didn’t speak up in a seminar, they’ll often say that they didn’t think their question was important enough, or that it might have been wrong-headed. They want to live life safely and not draw attention to themselves. This affects participation in discussion and debate at all stages of academic life - see this description of anxiety about participating in student classes. Of course, this doesn’t only affect women, nor does it affect all women. But it affects enough women to create an imbalance in who gets heard.

We do need to change this. Verbal exchanges after lectures and seminars are an important part of academic life, and women need to participate fully. There’s no point in encouraging men to listen to women’s voices if the women never speak up. If you are one of those silent women, I urge you to make an effort to overcome your bashfulness. You’ll find it less terrifying than you imagine, and it gets easier with practice. Don’t ask questions just for the sake of it, but when a speaker sparks off an interesting thought, a challenging question, or just a need for clarification, speak out. We need to change the culture here so that the next generation of women feel at ease in engaging in verbal academic debate.

Tuesday, 20 March 2012

The REF: a monster that sucks time and money from academic institutions

I’ve long had a pretty cynical attitude towards the periodic exercises for rating research activities of UK higher education institutions. The problem is cost-effectiveness. Institutions put forward detailed submissions in which the best research outputs of their academics are documented. A panel of top academics then considers these, and central funding is awarded according to the ratings. This takes up massive amounts of time of those writing and reviewing the submissions.The end result is a rank ordering of institutions that seldom contains any surprises. Pretty much the same ordering could be obtained by, for instance, taking a panel of top academics in a given field and sitting them down in a room to vote. This is a point made many years ago by Colin Blakemore, talking about the REF’s predecessor, the RAE.

For the REF 2014, the rules have now changed, so that we don’t only have to document what research we’ve published: we also have to demonstrate its impact. Impact has a very specific definition. It has to originate from a particular academic paper and you have to be able to quantify its effect in the wider world. This poses a new challenge to those preparing REF submissions, a challenge that many institutions are taking very seriously. All over the UK, meetings are being convened to discuss impact statements. Here in Oxford, we’ve already had several long meetings of senior professors devoted just to this issue. Then this week I saw an advertisement from UCL that goes a step further. They are looking for three editorial consultants on a salary of £32,055 - £38,744 per annum to work on their REF impact statements.

This induced in me a sense of despair. Why, you may ask? After all, academics are hard-pressed and this is a way of taking some of the burden from them, while ensuring that their work is presented in the best possible light. My problem with this is that it exemplifies a shift in priorities from substance to presentation. Funds that could have been used to support the university’s core functions of teaching or research go towards PR. And the REF, an exercise that is supposed to enhance the UK’s research, ends up leaching money as well as time from the system.

Here’s a suggestion. Those who are on REF panels should do their own private rankings of higher education institutions and put them in a sealed envelope now. After the REF results are announced, they can compare the outcome with their predictions. Then we will be able to see whether the huge amounts of time and money spent on this exercise have been worthwhile.

Sunday, 11 March 2012

A letter to Nick Clegg from an ex Liberal Democrat

Dear Nick Clegg

Yesterday I tweeted to see if anyone following #LDConf could tell me what the party stood for. It was a serious question, but so far no response.

I've voted Lib Dem for many years. My leanings are to the left but Labour’s perpetual internal wrangling was off-putting and the Iraq War an appalling mistake. I liked a lot of the LD policies and a few years ago joined the party. I’ve always had enormous respect for Evan Harris, who was my local MP until he was ousted at the last election.

I was initially sympathetic to the idea of working in Coalition and anticipated that the Lib Dems would act as a moderating force on the excesses of the Tories. In the early months, we saw precious little of that. I started to worry when tuition fees came and went with little sign of any Lib Dem protest. Changes to disability benefits were the next thing. I’d hoped Vince Cable would be able to tackle regulation of the banks and he’s shown willing but appears ultimately toothless. While all this was going on, I found myself wondering whether there would come a point when the Lib Dems might say “No, enough”, and would pull out of the Coalition and force an election. I thought that maybe they’d be holding themselves back so that they could be really effective when something major cropped up. Something that, if not tackled, would be disastrous for Britain. Something that would be difficult to reverse once change had been made. Something like destruction of the NHS.

Well, it didn’t happen. I resigned from the party some months ago when it was clear how things were going, but I retained a vestige of hope that Lib Dems would, at the eleventh hour, find the changes too hard to swallow. I was encouraged by Evan Harris coming out strongly to say the things that needed saying. But no.

But if you didn’t resist on the basis of conscience and principle, I had thought you might have the sense to resist on more pragmatic grounds. Yesterday, a poll showed that 8% of the population would vote Lib Dem. I predict that will fall further after this weekend. Think ahead to the next election. If you were the party who stood up to the Conservatives and prevented them from wrecking the NHS, you’d gain a lot of kudos with your traditional supporters. But instead, you are the party who helped the Conservatives push through marketisation of the NHS. Well, there are many voters who may want marketisation, but they’re not going to vote for you at the next election, they’re going to vote Conservative. Your traditional voters didn’t want any of that, and will abandon you, as many, like me, already have. Baroness Williams argued that the Lib Dems have watered down the bill to make it more palatable. I’m sorry, people just aren’t going to vote for a party whose only role seems to be to help the Conservatives achieve their aims while not really believing in those aims. If you can’t see that, you’re not fit to be party leader.

Saturday, 10 March 2012

Blogging in the service of science

In my last blogpost, I made some critical comments about a paper that was published in 2003 in the Proceedings of the National Academy of Sciences (PNAS). There were a number of methodological failings that meant that the conclusions drawn by the authors were questionable. But that was not the only point at issue. In addition, I expressed concerns about the process whereby this paper had come to be published in a top journal, especially since it claimed to provide evidence of efficacy of an intervention that two of the authors had financial interests in.

It’s been gratifying to see how this post has sparked off discussion. To me it just emphasises the value of tweeting and blogging in academic life: you can have a real debate with others all over the world. Unlike the conventional method of publishing in journals, it’s immediate. But it’s better than face-to-face debate, because people can think about what they write, and everyone can have their say.

There are three rather different issues that people have picked up on.

1. The first one concerns methods in functional brain imaging; the debate is developing nicely on Daniel Bor’s blog and I’ll not focus on it here.

2. The second issue concerns the unusual routes by which people get published in PNAS. Fellows of the National Academy of Science are able to publish material in the journal with only “light touch” review. In this article, Rand and Pfeiffer argue that this may be justified because papers that are published via this route include some with very high citation counts. My view is that the Temple et al article illustrates that this is a terrible argument. Temple et al have had 270 citations, so would be categorised by Rand and Pfeiffer as a “truly exceptional” paper. Yet, it contains basic methodological errors that compromise its conclusions. I know some people would use this as an argument against peer review, but I’d rather say this is an illustration of what happens if you ignore the need for rigorous review. Of course, peer review can go wrong, and often does. But in general, a journal’s reputation rests on it not publishing flawed work, and that’s why I think there’s still a role for journals in academic communications. I would urge the editors of PNAS, however, to rethink their publication policy sp that all papers, regardless of the authors, get properly reviewed by experts in the field. Meanwhile, people might like to add their own examples of highly cited yet flawed PNAS “contributions” to the comments on this blogpost.

3. The third issue is an interesting one raised by Neurocritic, who asked “How much of the neuroimaging literature should we discard?” Jon Simons (@js_simons) then tweeted “It’s not about discarding, but learning”. And, on further questioning, he added “No study is useless. Equally, no study means anything in isolation. Indep replication is key.” and then “Isn't it the overinterpretation of the findings that's the problem rather than paper itself?” Now, I’m afraid this was a bit too much for me. My view of the Temple et al study was that it was not so much useless as positively misleading. It was making claims about treatment efficacy that were used to promote a particular commercial treatment in which the authors had a financial interest. Because it lacked a control group, it was not possible to conclude anything about the intervention effect. So to my mind the problem was “the paper itself”, in that the study was not properly designed. Yet it had been massively influential and almost no-one had commented on its limitations.

At this point, Ben Goldacre (@bengoldacre) got involved. His concerns were rather different to mine, namely “retraction / non-publication of bad papers would leave the data inaccessible.” Now, this strikes me as a rather odd argument. Publishing a study is NOT the same as making the data available. Indeed, in many cases, as in this one, the one thing you don’t get in the publication is the data. For instance, there’s lots of stuff in Temple et al that was not reported. We’re told very little about the pattern of activations in the typical-reader group, for instance, and there’s a huge matrix of correlations that was computed with only a handful actually reported. So I think Ben’s argument about needing access to the data is beside the point. I love data as much as he does, and I’d agree with him that it would be great if people deposited data from their studies in some publicly available archive so nerdy people could pick over them. But the issue here is not about access to data. It’s about what do you do with a paper that's already published in a top journal and is actually polluting the scientific process because its misleading conclusions are getting propagated through the literature.

My own view is that it would be good for the field if this paper was removed from the journal, but I’m a realist and I know that won’t happen. Neurocritic has an excellent discussion of retraction and alternatives to retraction in a recent post, which has stimulated some great comments. As he notes, retraction is really reserved for cases of fraud or factual error, not for poor methodology. But, depressing though this is, I’m encouraged by the way that social media is changing the game here. The Arsenic Life story was a great example of how misleading, high-profile work can get put in perspective by bloggers, even if peer reviewers haven’t done their job properly. If that paper had been published five years ago, I am guessing it would have been taken far more seriously, because of the inevitable delays in challenging it through official publication routes. Bloggers allowed us to see not only what the flaws were, but also rapidly indicated a consensus of concern among experts in the field. The openness of the blogosphere means that opinions of one or two jealous or spiteful reviewers will not be allowed to hold back good work, but equally, cronyism just won’t be possible.

We already have quite a few ace neuroscientist bloggers: I hope that more will be encouraged to enter the fray and help offer an alternative, informal commentary on influential papers as they appear.

Monday, 5 March 2012

Time for neuroimaging (and PNAS) to clean up its act

There are rumblings in the jungle of neuroscience. There’s been a recent spate of high-profile papers that have drawn attention to methodological shortcomings in neuroimaging studies (e.g., Ioannidis, 2011; Kriegeskorte et al., 2009; Nieuwenhuis et al, 2011) . This is in response to published papers that regularly flout methodological standards that have been established for years. I’ve recently been reviewing the literature on brain imaging in relation to intervention for language impairments and came across this example.
Temple et al (2003) published an fMRI study of 20 children with dyslexia who were scanned both before and after a computerised intervention (FastForword) designed to improve their language. The article in question was published in the Proceedings of the National Academy of Sciences, and at the time of writing has had 270 citations. I did a spot check of fifty of those citing articles to see if any had noted problems with the paper: only one of them did so. The others repeated the authors’ conclusions, namely:

1. The training improved oral language and reading performance.

2. After training, children with dyslexia showed increased activity in multiple brain areas.

3. Brain activation in left temporo-parietal cortex and left inferior frontal gyrus became more similar to that of normal-reading children.

4. There was a correlation between increased activation in left temporo-parietal cortex and improvement in oral language ability.

But are these conclusions valid? I'd argue not, because:

There was no dyslexic control group. See this blogpost for why this matters. The language test scores of the treated children improved from pre-test to post-test, but where properly controlled trials have been done, equivalent change has been found in untreated controls (Strong et al., 2011). Conclusion 1 is not valid.

The authors presented uncorrected whole brain activation data. This is not explicitly stated but can be deduced from the z-scores and p-values. Russell Poldrack, who happens to be one of the authors of this paper, has written eloquently on this subject: “…it is critical to employ accurate corrections for multiple tests, since a large number of voxels will generally be significant by chance if uncorrected statistics are used. .. The problem of multiple comparisons is well known but unfortunately many journals still allow publication of results based on uncorrected whole-brain statistics.” Conclusion 2 is based on uncorrected p-values and is not valid.

To demonstrate that changes in activation for dyslexics made them more like typical children, one would need to demonstrate an interaction between group (dyslexic vs typical) and testing time (pre-training vs post-training). Although a small group of typically-reading children was tested on two occasions, this analysis was not done. Conclusion 3 is based on images of group activations rather than statistical comparisons that take into account within-group variance. It not valid.

There was no a priori specification of which language measures were primary outcomes, and numerous correlations with brain activation were computed, with no correction for multiple comparisons. The one correlation that the authors focus on (Figure reproduced below) is (a) only significant on a one-tailed test at .05 level; (b) driven by two outliers (encircled), both of whom had a substantial reduction in left temporo-parietal activation associated with a lack of language improvement. Conclusion 4 is not valid. Incidentally, the mean activation change (Y-axis) in this scatterplot is also not significantly different from zero. I'm not sure what this means, as it’s hard to interpret the “effect size” scale, which is described as “the weighted sum of parameter estimates from the multiple regression for rhyme vs. match contrast pre- and post-training.”


Figure 2 from Temple et al. (2003). Data from dyslexic children	.

How is it that this paper has been so influential? I suggest that it is largely because of the image below, summarising results from the study. This was reproduced in a review paper by the senior author that appeared in Science in 2009. This has already had 42 citations. The image is so compelling that it’s also been used in promotional material for a commercial training program other than the one that was used in the study. As McCabe and Castel (2008) have noted, a picture of a brain seems to make people suspend normal judgement.

I don’t like to single out a specific paper for criticism in this way, but feel impelled to do so because the methodological problems were so numerous and so basic. For what it’s worth, every paper I have looked at in this area has had at least some of the same failings. However, in the case of Temple et al (2003) the problem is compounded by the declared interests of two of the authors, Merzenich and Tallal, who co-founded the firm that markets the FastForword intervention. One would have expected a journal editor to subject a paper to particularly stringent scrutiny under these circumstances.

We can also ask why those who read and cite this paper haven’t noted the problems. One reason is that neuroimaging papers are complicated and the methods can be difficult to understand if you don’t work in the area.

Is there a solution? One suggestion is that reviewers and readers would benefit from a simple cribsheet listing the main things to look for in a methods section of a paper in this area. Is there an imaging expert out there who could write such a document, targeted at those like me, who work in this broad area, but aren’t imaging experts? Maybe it already exists, but I couldn’t find anything like that on the web.

Imaging studies are expensive and time-consuming to do, especially when they involve clinical child groups. I'm not one of those who thinks they aren't ever worth doing. If an intervention is effective, imaging may help throw light on its mechanism of action. However, I do not think it is worthwhile to do poorly-designed studies of small numbers of participants to test the mode of action of an intervention that has not been shown to be effective in properly-controlled trials. It would make more sense to spend the research funds on properly controlled trials that would allow us to evaluate which interventions actually work.

References
Gabrieli, J. D. (2009). Dyslexia: a new synergy between education and cognitive neuroscience. Science, 325(5938), 280-283.

Ioannidis, J. P. A. (2011). Excess significance bias in the literature on brain volume abnormalities. Arch Gen Psychiatry, 68(8), 773-780. doi: 10.1001/archgenpsychiatry.2011.28

Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. F., & Baker, C. I. (2009). Circular analysis in systems neuroscience: the dangers of double dipping. [10.1038/nn.2303]. Nature Neuroscience, 12(5), 535-540. doi: http://www.nature.com/neuro/journal/v12/n5/suppinfo/nn.2303_S1.html

McCabe, D., & Castel, A. (2008). Seeing is believing: The effect of brain images on judgments of scientific reasoning Cognition, 107 (1), 343-352 DOI: 10.1016/j.cognition.2007.07.017

Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E.-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. [10.1038/nn.2886]. Nature Neuroscience, 14(9), 1105-1107.

Poldrack, R. A., & Mumford, J. A. (2009). Independence in ROI analysis: where is the voodoo? Social Cognitive and Affective Neuroscience, 4(2), 208-213.

Strong, G. K., Torgerson, C. J., Torgerson, D., & Hulme, C. (2010). A systematic meta-analytic review of evidence for the effectiveness of the ‘Fast ForWord’ language intervention program. Journal of Child Psychology and Psychiatry, in press, doi: 10.1111/j.1469-7610.2010.02329.x.

Temple, E., Deutsch, G. K., Poldrack, R. A., Miller, S. L., Tallal, P., Merzenich, M. M., & Gabrieli, J. D. E. (2003). Neural deficits in children with dyslexia ameliorated by behavioral remediation: Evidence from functional MRI. Proceedings of the National Academy of Sciences of the United States of America, 100(5), 2860-2865. doi: 10.1073/pnas.0030098100