Thursday, 21 March 2013

Blogging as post-publication peer review: reasonable or unfair?



In a previous blogpost, I criticised a recent paper claiming that playing action video games improved reading in dyslexics. In a series of comments below the blogpost, two of the authors, Andrea Facoetti and Simone Gori, have responded to my criticisms. I thank them for taking the trouble to spell out their views and giving readers the opportunity to see another point of view. I am, however, not persuaded by their arguments, which make two main points. First, that their study was not methodologically weak and so Current Biology was right to publish it, and second, that it is unfair, and indeed unethical, to criticise a scientific paper in a blog, rather than through the regular scientific channels.
Regarding the study methodology, as noted above, the principal problem with the study by Franceschini et al was that it was underpowered, with just 10 participants per group.  The authors reply with an argument ad populum, i.e. many other studies have used equally small samples. This is undoubtedly true, but it doesn’t make it right. They dismiss the paper I cited by Christley (2010) on the grounds that it was published in a low impact journal. But the serious drawbacks of underpowered studies have been known about for years, and written about in high- as well as low-impact journals (see references below).
The response by Facoetti and Gori illustrates the problem I had highlighted. In effect, they are saying that we should believe their result because it appeared in a high-impact journal, and now that it is published, the onus must be on other people to demonstrate that it is wrong. I can appreciate that it must be deeply irritating for them to have me expressing doubt about the replicability of their result, given that their paper passed peer review in a major journal and the results reach conventional levels of statistical significance. But in the field of clinical trials, the non-replicability of large initial effects from small trials has been demonstrated on numerous occasions, using empirical data - see in particular the work of Ioannidis, referenced below. The reasons for this ‘winner’s curse’ have been much discussed, but its reality is not in doubt. This is why I maintain that the paper would not have been published if it had been reviewed by scientists who had expertise in clinical trials methodology. They would have demanded more evidence than this.
The response by the authors highlights another issue: now that the paper has been published, the expectation is that anyone who has doubts, such as me, should be responsible for checking the veracity of the findings. As we say in Britain, I should put up or shut up. Indeed, I could try to get a research grant to do a further study. However, I would probably not be allowed by my local ethics committee to do one on such a small sample and it might take a year or so to do, and would distract me from my other research. Given that I have reservations about the likelihood of a positive result, this is not an attractive option. My view is that journal editors should have recognised this as a pilot study and asked the authors to do a more extensive replication, rather than dashing into print on the basis of such slender evidence. In publishing this study, Current Biology has created a situation where other scientists must now spend time and resources to establish whether the results hold up.
To establish just how damaging this can be, consider the case of the FastForword intervention, developed on the basis of a small trial initially reported in Science in 1996. After the Science paper, the authors went directly into commercialization of the intervention, and reported only uncontrolled trials. It took until 2010 for there to be enough reasonably-sized independent randomized controlled trials to evaluate the intervention properly in a meta-analysis, at which point it was concluded that it had no beneficial effect. By this time, tens of thousands of children had been through the intervention, and hundreds of thousands of research dollars had been spent on studies evaluating FastForword.
I appreciate that those reporting exciting findings from small trials are motivated by the best of intentions – to tell the world about something that seems to help children. But the reality is that, if the initial trial is not adequately powered, it can be detrimental both to science and to the children it is designed to help, by giving such an imprecise and uncertain estimate of the effectiveness of treatment.
Finally, a comment on whether it is fair to comment on a research article in a blog, rather than going through the usual procedure of submitting an article to a journal and having it peer-reviewed prior to publication. The authors’ reactions to my blogpost are reminiscent of Felicia Wolfe-Simon’s response to blog-based criticisms of a paper she published in Science: "The items you are presenting do not represent the proper way to engage in a scientific discourse”. Unlike Wolfe-Simon, who simply refused to engage with bloggers, Facoetti and Gori show willingness to discuss matters further, and present their side of the story, but they nevertheless it is clear they do not regard a blog as an appropriate place to debate scientific studies. 
I could not disagree more. As was readily demonstrated in the Wolfe-Simon case, what has come to be known as ‘post-publication peer review’ via the blogosphere can allow for new research to be rapidly discussed and debated in a way that would be quite impossible via traditional journal publishing. In addition, it brings the debate to the attention of a much wider readership. Facoetti and Gori feel I have picked on them unfairly: in fact, I found out about their paper because I was asked for my opinion by practitioners who worked with dyslexic children. They felt the results from the Current Biology study sounded too good to be true, but they could not access the paper from behind its paywall, and in any case they felt unable to evaluate it properly. I don’t enjoy criticising colleagues, but I feel that it is entirely proper for me to put my opinion out in the public domain, so that this broader readership can hear a different perspective from those put out in the press releases. And the value of blogging is that it does allow for immediate reaction, both positive and negative. I don’t censor comments, provided they are polite and on-topic, so my readers have the opportunity to read the reaction of Facoetti and Gori. 
I should emphasise that I do not have any personal axe to grind with the study's authors, who I do not know personally. I’d be happy to revise my opinion if convincing arguments are put forward, but I think it is important that this discussion takes place in the public domain, because the issues it raises go well beyond this specific study.

References
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, advance online publication. doi: 10.1038/nrn3475
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. doi: 10.1371/journal.pmed.0020124
Ioannidis, J. P. (2008). Why most discovered true associations are inflated. Epidemiology 19(5), 640-648.
Ioannidis JP, Pereira TV, & Horwitz RI (2013). Emergence of large treatment effects from small trials--reply. JAMA : the journal of the American Medical Association, 309 (8), 768-9 PMID: 23443435

Tuesday, 19 March 2013

Ten things than can sink a grant proposal: Advice for a young psychologist


 © cartoonstock.com
So you’ve slaved away for weeks giving up any semblance of social or family life in order to put your best ideas on paper. The grant proposal disappears into the void for months during which your mental state oscillates between optimistic fantasies of the scientific glory that will result when your research is funded, and despair and anxiety at the prospect of rejection. And then it comes: the email of doom: “We regret that your application was not successful.” Sometimes just a bald statement, and sometimes embellished with reviewer comments and ratings that induce either rage or depression, depending on your personality type.

There are three things worth noting at this point. First, rejection is the norm: success rates vary depending on the funding scheme, but it’s common to see funding rates around 20% or less. Second, resilience in the face of rejection is a hallmark of the successful scientist, at least as important as intelligence and motivation. Third, there is a huge amount of luck in the grants process: just as with the journal peer review process, reviewers and grant panel members frequently have disparate opinions, and rejection does not mean the work is no good. However, although chance is a big factor, it's not the only thing.

This week I participated in a workshop on “How to get a grant” run by my colleague Masud Husain. We are both seasoned grant reviewers and have served on grants panels. Masud prepared some slides where he noted things that can lead to grant rejection, and I dug out an old powerpoint from a similar talk I’d given in 2005. There was remarkable convergence between the points that we highlighted, based on our experiences of seeing promising work rejected by grants panels. So it seemed worth sharing our insights with the wider world. These comments are tailored to postdocs in psychology/neuroscience in the UK, though some will have broader applicability.

1. Lack of clarity

The usual model for grant evaluation is that the proposal goes to referees with expertise in the area, and is then considered by a panel of people who cover the whole range of areas that is encompassed by the funding scheme. The panel will, of course, rely heavily on expert views, but your case can only be helped if the other panel members can understand what you want to do and why it is important. Even if they can't follow all the technical details, they should be able to follow the lay abstract and introduction.

It's crucial, therefore, that you give the draft proposal to someone who is not an expert in your research topic - preferably not a close friend, but someone more likely to be critical. Ask them to be brutally honest about the bits that they don't understand. When they give you this feedback, don't argue with them or attempt verbal explanations; just rewrite until they do understand it.

2.Badly written proposal

In an ideal world, funders should focus on the content of your proposal rather than the presentation, right? Things like spelling, formatting, and so on are trivial details that only inferior brains worry about, right?

Nope. Wrong on both counts. The people reading your grant are busy. They may have a stack of proposals to evaluate. I have, for instance, been involved in evaluating for a postdoctoral fellowship scheme where my task was to select the top five from a heap of forty odd proposals. The majority of proposals are very good, and so this is a task that is both difficult and important. You  can end up feeling like one of Pavlov's dogs forced to make ever-finer discriminations, and this can put you in a grumpy and unforgiving mood. You take a dim view of proposals where there are typos, spelling errors and missing references. I've seen grant proposals where the applicant failed to turn off 'track changes', or where 'insert reference here' is in the text. In this highly competitive context, there's a high chance that these will go on the 'reject' heap. Even if there are no errors in the text, a densely packed page of verbiage is harder for the reviewer to absorb than a well laid-out document with spacing and headings. You will usually feel that the word limit is too short, and it is tempting to pack in as many words as possible, but this is a mistake. Better ditch material than confront your reviewer with an intimidating wall of words. Judicious use of figures can make a huge difference to the readability of your text, and readability is key. I personally dislike it when numbers are used to indicate references, especially if the reference list then omits titles of referenced papers: people commonly do this to save space, but I like to be able to readily work out what references are being referred to.

Anyone can improve the presentation of a grant. Use of a spell-checker is obvious, but if possible, you should also look at examples of successful applications to see what works in terms of layout etc. You can also Google "good document layout" to find websites full of advice.

3. Boring or pointless proposal

This is a difficult one, because what one person finds riveting, another finds tedious. But if you find your proposal boring, then there's close to zero chance anyone will want to fund it. You should never submit a grant proposal unless you are genuinely excited by the work that you are proposing. You need to ask yourself "Is this what I most want to spend my life doing over the next 2-3 years?" If the answer is no, then rethink the proposal. If yes, then it's crucial to convey your enthusiasm.

4. Lack of hypotheses

This is a common reason for rejection of grant proposals. The phrase 'fishing expedition' is often used to dismiss research that involves looking at a large number of variables in an unfocussed way. As an aside, I remember an exasperated colleague saying that a fishing expedition was an entirely sensible approach if the aim was to catch fish! But funding bodies want to see clear, theoretically-driven predictions with an indication of how the research will test these. A hypothesis should have sufficient generality to be interesting, and usually will be tested by a variety of methods.  For instance, suppose I think that dyslexia may be caused by a particular kind of sensory deficit, and I plan to test children on a range of visual and auditory tasks. I could say that my hypothesis is that there will be differences between dyslexics and controls on the test battery, but this is too vague. It would be better to describe a particular hypothesis of, say visual deficit, and make predictions about the specific tasks that should show deficits. Better still one would set out a general hypothesis about links between the putative deficit and dyslexia, and specify a set of experiments that tested the predictions using a range of methods.

Also, ask yourself, is your hypothesis is falsifiable, and will it yield interesting findings even if it is rejected. If the answer is no, rethink.

5. Overambitious proposal

This is another common reason for rejection of proposals, particularly by junior applicants. In psychology, people commonly overestimate how many participants can be recruited (especially in clinical and longitudinal studies) and how much testing experimenters can do. Of course, you do sometimes see cases where the proposal does not contain enough. But that is much less common that the opposite.

If you are working with human participants, you need to demonstrate that you have thought about two things:
a) Participant recruitment
  • Where will you recruit from?
  • Have you liaised with referral sources?
  • How many suitable people exist?
  • What proportion will agree to take part?
  • Overall, how many participants will you be able to include in a given period (e.g. 3 months/ 1 year)?
  • Have you taken into account the time it will take to get ethics approval?
  • Have you costed proposal to take into account reimbursements to partipants and travel?

b) Is your estimate of research personnel realistic?
  • How long does it take to test one participant?
  • Have you taken into account  the fact that researchers need to spend time on :
    - Scheduling appointments
    - Travelling
    - Scoring up/entering/analysing data
    - Doing other academic things (e.g. reading relevant literature, attending seminars)
If you are working with fancy equipment, then you need to consider things like whether you or your research staff will need training to use it, as well as availability.

For more on this, see my previous blogpost about an excellent article by Hodgson and Rollnick (1989): "More fun, less stress: How to survive in research", which details the mismatch between people's expectations of how long research takes and the reality.

6. Overoptimistic proposal

An overoptimistic proposal assumes that results will turn out in line with prediction and has no fall-back position if they don't. A proposal should tell us something useful even if the exciting predictions don't work out. You should avoid multi-stage experiments where the whole enterprise would be rendered worthless if the first experiment failed.

7. Proposal depends on untried or complex methods

You're unlikely to be funded if you propose a set of studies using a method in which you have limited experience, unless you can show that you have promising pilot data. If you do want to move in a new direction, try to link up with someone who has some expertise in it, and consider having them as a collaborator. Although funders don't want to take risk with applicants who have no experience in a new method, they do like proposals to include a training component, and for researchers to gain experience in different labs, even if just for a few months.

8. Overcosted (or undercosted) proposal

This one is easy: Ask for everything that you do need, but don't ask for things you don't need. This is not the time to smuggle in funding for that long-desired piece of equipment unless it is key to the proposal.

The committee will also be unimpressed if you ask for things the host institution should provide. But don't omit crucial equipment because of concerns about expense: just be realistic about what you need and explicitly justify everything.

9. Proposal is too risky

This is much harder one to call. Most funding bodies say they don’t want to fund predictable studies, but they are averse to research where there is high risk of nothing of interest emerging. A US study of NIH funding patterns came to the depressing conclusion that researchers who did high-impact but unconventional research often missed out on funding (Nicholson & Ioannidis, 2012). Funders often state that they like multidisciplinary research, but that runs the risk that, unless methodologically impeccable in all the areas that are covered, it will get turned down.

If you want to include a high-risk element to the proposal, take advice from a senior person whose views you trust - their reaction should give you an indication of whether to go ahead, and if so which aspects will need most justification. And if you want to include a component from a field you are not an expert in, it is vital to take advice from someone senior who does know that area.

It is usually sensible to be up-front about the risky element, and to explain why the risk is worth taking. If you are planning a high-risk project, always have a safety net - i.e. include some more conventional studies in the proposal to ensure that the whole project won't be sunk if the risky bit doesn't pan out.

10. Statistics underspecified or flawed

You need to describe the statistical analysis that you plan, even if it seems obvious to you - if only to demonstrate to the panel that you know what you are doing and have the competence to do it. If you are planning to use complex statistics, get advice from a statistician, and make it clear in the proposal that you have done so. If you don't have adequate statistical skill, consider having a statistician as consultant or collaborator on the grant. And do not neglect power analysis: underpowered studies are a common reason for grants to be rejected in biomedical areas.

Most grants panels are multidisciplinary, and there can be huge cultural differences in statistical practices between disciplines. I've seen cases where a geneticist has criticised a psychology project for lack of statistical power (something geneticists are very hot on), or where a medic criticises an experimental intervention study for not using a randomised controlled design. Don't just propose the analysis that you usually do: find out what is best practice to ensure you won't be shot down for a non-optimal research design or analytic approach.

***********************
Finally, remember that the proposed research is one of three elements that will be assessed: the others are the candidate and the institution. There's no point in applying for a postdoctoral fellowship if you have a weak CV: you do need to have publications, preferably first-authored papers. There's a widespread view that you don't stand a chance of funding unless you have papers in high impact journals, but that's not necessarily true, especially in psychology. I'm more impressed by one or two solid first-authored papers than by a long string of publications where you are just one author among many, and (in line with Wellcome Trust policy) I don't give a hoot about journal impact factors. Most funding agencies will give you a steer on whether your CV is competitive if you ask for advice on this.

As far as the institution goes, it helps to come from a top research institution, but the key thing is to have strong institutional support, with access to the resources you need and to supportive colleagues. You will need a cover letter from your institution, and the person writing it should convey enthusiasm for your proposal and be explicit in making a commitment to providing space and other resources.

Good luck!

Reference
Nicholson JM, & Ioannidis JP (2012). Research grants: Conform and be funded. Nature, 492 (7427), 34-6 PMID: 23222591

Sunday, 10 March 2013

High-impact journals: where newsworthiness trumps methodology

Here’s a paradox: Most scientists would give their eye teeth to get a paper in a high impact journal, such as Nature, Science, or Proceedings of the National Academy of Sciences. Yet these journals have had a bad press lately, with claims that the papers they publish are more likely to be retracted than papers in journals with more moderate impact factors. It’s been suggested that this is because the high impact journals treat newsworthiness as an important criterion for accepting a paper. Newsworthiness is high when a finding is both of general interest and surprising, but surprising findings have a nasty habit of being wrong.

A new slant on this topic was provided recently by a paper by Tressoldi et al (2013), who compared the statistical standards of papers in high impact journals with those of three respectable but lower-impact journals. It’s often assumed that high impact journals have a very high rejection rate because they adopt particularly rigorous standards, but this appears not to be the case. Tressoldi et al focused specifically on whether papers reported effect sizes, confidence intervals, power analysis or model-fitting. Medical journals fared much better than the others, but Science and Nature did poorly on these criteria. Certainly my own experience squares with the conclusions of Tressoldi et al (2013), as I described in the course of discussion about an earlier blogpost.

Last week a paper appeared in Current Biology (impact factor = 9.65) with the confident title: “Action video games make dyslexic children read better.” It's a classic example of a paper that is on the one hand highly newsworthy, but on the other, methodologically weak. I’m not usually a betting person, but I’d be prepared to put money on the main effect failing to replicate if the study were repeated with improved methodology. In saying this, I’m not suggesting that the authors are in any way dishonest. I have no doubt that they got the results they reported and that they genuinely believe they have discovered an important intervention for dyslexia. Furthermore, I’d be absolutely delighted to be proved wrong: There could be no better news for children with dyslexia than to find that they can overcome their difficulties by playing enjoyable computer games rather than slogging away with books. But there are good reasons to believe this is unlikely to be the case.

An interesting way to evaluate any study is to read just the Introduction and Methods, without looking at Results and Discussion. This allows you to judge whether the authors have identified an interesting question and adopted an appropriate methodology to evaluate it, without being swayed by the sexiness of the results. For the Current Biology paper, it’s not so easy to do this, because the Methods section has to be downloaded separately as Supplementary Material. (This in itself speaks volumes about the attitude of Current Biology editors to the papers they publish: Methods are seen as much less important than Results). On the basis of just Introduction and Methods, we can ask whether the paper would be publishable in a reputable journal regardless of the outcome of the study.

On the basis of that criterion, I would argue that the Current Biology paper is problematic, purely on the basis of sample size. There were 10 Italian children aged 7 to 13 years in each of two groups: one group played ‘action’ computer games and the other was a control group playing non-action games (all games from Wii's Rayman Raving Rabbids - see here for examples). Children were trained for 9 sessions of 80 minutes per day over two weeks. Unfortunately, the study was seriously underpowered. In plain language, with a sample this small, even if there is a big effect of intervention, it would be hard to detect it. Most interventions for dyslexia have small-to-moderate effects, i.e. they improve performance in the treated group by .2 to .5 standard deviations. With 10 children per group, the power is less than .2, i.e. there’s a less than one in five chance of detecting a true effect of this magnitude. In clinical trials, it is generally recommended that the sample size be set to achieve power of around .8. This is only possible with a total sample of 20 children if the true effect of intervention is enormous – i.e. around 1.2 SD, meaning there would be little overlap between the two groups’ reading scores after intervention. Before doing this study there would have been no reason to anticipate such a massive effect of this intervention, and so use of only 10 participants per group was inadequate. Indeed, in the context of clinical trials, such a study would be rejected by many ethics committees (IRBs) because it would be deemed unethical to recruit participants for a study which had such a small chance of detecting a true effect.

But, I hear you saying, this study did find a significant effect of intervention, despite being underpowered. So isn’t that all the more convincing? Sadly, the answer is no. As Christley (2010) has demonstrated, positive findings in underpowered studies are particularly likely to be false positives when they are surprising – i.e., when we have no good reason to suppose that there will be a true effect of intervention. This seems particularly pertinent in the case of the Current Biology study – if playing active computer games really does massively enhance children’s reading, we might have expected to see a dramatic improvement in reading levels in the general population in the years since such games became widely available.

The small sample size is not the only problem with the Current Biology study. There are other ways in which it departs from the usual methodological requirements of a clinical trial: it is not clear how the assignment of children to treatments was made or whether assessment was blind to treatment status, no data were provided on drop-outs, on some measures there were substantial differences in the variances of the two groups, no adjustment appears to have been made for the non-normality of some outcome measures, and a follow-up analysis was confined to six children in the intervention group. Finally, neither group showed significant improvement in reading accuracy, where scores remained 2 to 3 SD below the population mean (Tables S1 and S3): the group differences were seen only for measures of reading speed.

Will any damage be done? Probably not much – some false hopes may be raised, but the stakes are not nearly as high as they are for medical trials, where serious harm or even death can result from wrong results. There is concern, however, that quite apart from the implications for families of children with reading problems, there is another issue here, about the publication policies of high-impact journals. These journals wield immense power. It is not overstating the case to say that a person’s career may depend on having a publication in a journal like Current Biology (see this account – published, as it happens, in Current Biology!). But, as the dyslexia example illustrates, a home in a high-impact journal is no guarantee of methodological quality. Perhaps this should not surprise us: I looked at the published criteria for papers on the websites of Nature, Science, PNAS and Current Biology. None of them mentioned the need for strong methodology or replicability; all of them emphasised “importance” of the findings.

Methods are not a boring detail to be consigned to a supplement: they are crucial in evaluating research. My fear is that the primary goal of some journals is media coverage, and consequently science is being reduced to journalism, and is suffering as a consequence.

References

Brembs, B., & Munafò, M. R. (2013). Deep impact: Unintended consequences of journal rank. arXiv:1301.3748.

Christley, R. M. (2010). Power and error: increased risk of false positive results in underpowered studies. The Open Epidemiology Journal, 3, 16-19.

Halpern, S. D.,  Karlawish, J. T, & Berlin, J. A. (2002). The continuing unethical conduct of underpowered clinical trials. Journal of the American Medical Association, 288(3), 358-362. doi: 10.1001/jama.288.3.358

Lawrence, P. A. (2007). The mismeasurement of science. Current Biology, 17(15), R583-R585. doi: 10.1016/j.cub.2007.06.014

Tressoldi, P., GiofrĂ©, D., Sella, F., & Cumming, G. (2013). High Impact = High Statistical Standards? Not Necessarily So. PLoS ONE, 8 (2) DOI: 10.1371/journal.pone.0056180