Monday 31 October 2011

A message to the world

from a teenager with language difficulties

Wednesday 26 October 2011

Accentuate the negative

Suppose you run a study to compare two groups of children: say a dyslexic group and a control group. Your favourite theory predicts a difference in auditory perception, but you find no difference between the groups. What to do? You may feel a further study is needed: perhaps there were floor or ceiling effects that masked true differences. Maybe you need more participants to detect a small effect. But what if you can’t find flaws in the study and decide to publish the result? You’re likely to hit problems. Quite simply, null results are much harder to publish than positive findings. In effect, you are telling the world “Here’s an interesting theory that could explain dyslexia, but it’s wrong.” It’s not exactly an inspirational message, unless the theory is so prominent and well-accepted that the null finding is surprising. And if that is the case, then it’s unlikely that your single study is going to be convincing enough to topple the status quo. It has been recognised for years that this “file drawer problem” leads to distortion of the research literature, creating an impression that positive results are far more robust than they really are (Rosenthal, 1979).
The medical profession has become aware of the issue and it’s now becoming common practice for clinical trials to be registered before a study commences, and for journals to undertake to publish the results of methodologically strong studies regardless of outcome. In the past couple of years, two early-intervention studies with null results have been published, on autism (Green et al, 2010) and late talkers (Wake et al, 2011). Neither study creates a feel-good sensation: it’s disappointing that so much effort and good intentions failed to make a difference. But it’s important to know that, to avoid raising false hopes and wasting scarce resources on things that aren’t effective. Yet it’s unlikely that either study would have found space in a high-impact journal in the days before trial registration.
Registration can also exert an important influence in cases where conflict of interest or other factors make researchers reluctant to publish null results. For instance, in 2007, Cylharova et al published a study relating membrane fatty acid levels to dyslexia in adults. This research group has a particular interest in fatty acids and neurodevelopmental disabilities, and the senior author has written a book on this topic. The researchers argued that the balance of omega 3 and omega 6 fatty acids differed between dyslexics and non-dyslexics, and concluded: “To gain a more precise understanding of the effects of omega-3 HUFA treatment, the results of this study need to be confirmed by blood biochemical analysis before and after supplementation”. They further stated that a randomised controlled trial was underway. Yet four years later, no results have been published and requests for information about the findings are met with silence. If the trial had been registered, the authors would have been required to report the results, or explain why they could not do so.
Advance registration of research is not a feasible option for most areas of psychology, so what steps can we take to reduce publication bias? Many years ago a wise journal editor told me that publication decisions should be based on evaluation of just the Introduction and Methods sections of a paper: if an interesting hypothesis had been identified, and the methods were appropriate to test it, then the paper should be published, regardless of the results.
People often respond to this idea saying that it would just mean the literature would be full of boring stuff. But remember, I'm not suggesting that any old rubbish should get published: there has to be a good case for doing the study made in the Introduction, and the Methods have to be strong. Also, some kinds of boring results are important: miminally, publication of a null result may save some hapless graduate student from spending three years trying to demonstrate an effect that’s not there. Estimates of effect sizes in meta-analyses are compromised if only positive findings get reported. More seriously, if we are talking about research with clinical implications, then over-estimation of effects can lead to inappropriate interventions being adopted.
Things are slowly changing and it’s getting easier to publish null results. The advent of electronic journals has made a big difference because there is no longer such pressure on page space. The electronic journal PLOS One adopts a publication policy that is pretty close to that proposed by the wise editor: they state they will publish all papers that are technically sound. So my advice to those of you who have null data from well-designed experiments languishing in that file drawer: get your findings out there in the public domain.


Cyhlarova, E., Bell, J., Dick, J., MacKinlay, E., Stein, J., & Richardson, A. (2007). Membrane fatty acids, reading and spelling in dyslexic and non-dyslexic adults European Neuropsychopharmacology, 17 (2), 116-121 DOI: 10.1016/j.euroneuro.2006.07.003

Green, J., Charman, T., McConachie, H., Aldred, C., Slonims, V., Howlin, P., Le Couteur, A., Leadbitter, K., Hudry, K., Byford, S., Barrett, B., Temple, K., Macdonald, W., & Pickles, A. (2010). Parent-mediated communication-focused treatment in children with autism (PACT): a randomised controlled trial The Lancet, 375 (9732), 2152-2160 DOI: 10.1016/S0140-6736(10)60587-9 

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86 (3), 638-641 DOI: 10.1037/0033-2909.86.3.638 

Wake M, Tobin S, Girolametto L, Ukoumunne OC, Gold L, Levickis P, Sheehan J, Goldfeld S, & Reilly S (2011). Outcomes of population based language promotion for slow to talk toddlers at ages 2 and 3 years: Let's Learn Language cluster randomised controlled trial. BMJ (Clinical research ed.), 343 PMID: 21852344

Saturday 15 October 2011

Lies, damned lies, and spin


The Department for Education (DfE) issued a press report this week entitled “England's 15-year-olds' reading is more than a year behind the best”. The conclusions were taken from analysis of data from the PISA 2009 study, an OECD survey of 15-year-olds in the principal industrialised countries.

The DfE report paints a dire picture: “GCSE pupils' reading is more than a year behind the standard of their peers in Shanghai, Korea and Finland….Fifteen-year-olds in England are also at least six months behind those in Hong Kong, Singapore, Canada, New Zealand, Japan and Australia, according to the Department for Education's (DfE) analysis of the OECD's 2009 Programme for International Student Assessment (PISA) study.” The report goes on to talk of England slipping behind other nations in reading.
Schools Minister Nick Gibb is quoted as saying: “The gulf between our 15-year-olds' reading abilities and those from other countries is stark – a gap that starts to open in the very first few years of a child's education.”
I started to smell a rat when I looked at a chart in the report, entitled “Attainment gap between England and the countries performing significantly better than England” (my emphasis). This seemed an odd kind of chart to provide if one wanted to evaluate how England is doing compared to other countries. So I turned to the report provided by the people who did the survey.
Here are some salient points taken verbatim from their summary on reading:
  • Twelve countries had mean scores for reading which were significantly higher than that of England. In 14 countries the difference in mean scores from that in England was not statistically significant. Thirty-eight countries had mean scores that were significantly lower than England.
  • The mean score for reading in England was slightly above the OECD average but this difference was not statistically significant.
  • England’s performance in 2009 does not differ greatly from that in the last PISA survey in 2006.
There is, of course, no problem with aiming high and wanting our children to be among the top achievers in the world. But that’s no excuse for the DfE's mendacious manipulation of information.

Bradshaw, J., Ager, R., Burge, B. and Wheater, R. (2010). PISA 2009: Achievement of 15-Year-Olds in England. Slough: NFER.

Wednesday 5 October 2011

The joys of inventing data

Have I gone over to the dark side? Cracked under pressure from the REF to resort to fabrication of results to secure that elusive Nature paper? Or had my brain addled by so many requests for information from ethics committees that I’ve just decided that its easier to be unethical? Well readers will be reassured to hear that none of these things is true. What I have to say concerns the benefits of made-up data for helping understand how to analyse real data.
In my field of experimental psychology, students get a thorough grounding in statistics and learn how to apply various methods for testing whether groups differ from one another, whether variables are associated and so on. But what they typically don’t get is any instruction in how to simulate datasets. This may be a historical hangover. When I first started out in the field, people didn’t have their own computers, and if you wanted to do an analysis you either laboriously assembled a set of instructions in Fortran which were punched onto cards and run on a mainframe computer (overnight if you were lucky), or you did the sums on a pocket calculator. Data simulation was just unfeasible for most people. Over the years, the landscape has changed beyond recognition and there are now windows-based applications that allow one to do complex multivariate statistics at the press of a button. There is a danger, however, which is that people do analyses without understanding them. And one of the biggest problems of all is a tendency to apply statistical analyses post hoc. You can tell people over and over that this is a Bad Thing (see Gould and Hardin, 2003) but they just don’t get it. A little simulation exercise can be worth a thousand words.
So here’s an illustration. Suppose we’ve got two groups each of 10 people, let’s say left-handers and right-handers. And we’ve given them a battery of 20 cognitive tests. When we scrutinise the results, we find that they don’t differ on most of the measures, but there’s a test of mathematical skill on which the left-handers outperform the right-handers. We do a t-test and are delighted to find that on this measure, the difference between groups is significant at the .05 level, so we write up a paper entitled "Left-handed advantage for mathematical skills" and submit it to a learned journal, not mentioning the other 19 tests. After all, they weren’t very interesting. Sounds OK? Well, it isn’t. We have fallen into the trap of using statistical methods that are valid for testing a hypothesis that is specified a priori in a situation where the hypothesis only emerged after scrutinising the data.
Let’s generate some data. Most people have access to Microsoft Excel, which is perfect for simple simulations. In row 1 we put our column labels, which are group, var1, var2, …. var 20.
In column A, we then have ten zeroes followed by ten ones, indicating group identity. We then use random numbers to complete the table. The simplest way to do this is to just type in each cell:
This generates a random number between 0 and 1.
A more sophisticated option is to generate a random z-score. This creates random numbers that meet the assumption of many statistical tests that data are normally distributed. You do this by typing:
At the foot of each column you can compute the mean and standard deviation for each group, and Excel automatically computes a p-value based on the t-test for comparing the groups with a command such as:
See this site if you need an explanation of this formula.
So the formulae in the first three columns look like this (rows 4-20 are hidden): 
Copy this formula across all columns. I added conditional formatting to row 27 so that ‘significant’ p-values are highlighted in yellow (and it just so happens with this example that the generated data gave a p-value less than .05 for column C).
Every time you type anything at all on the sheet, all the random numbers are updated: I’ve just added a row called ‘thisrun’ and typing any number in cell B29 will re-run the simulation.  This provides a simple way of generating a series of simulations and seeing when p-values fall below .05. On some runs, all the t-tests are nonsignificant, but you’ll quickly see that on many runs one or more p-values are below .05. In fact, on average, across numerous runs, the average number of significant values is going to be one because we have twenty columns, and 1/20 = .05. That’s what p < .05 means! If this doesn’t convince you of the importance of specifying your hypothesis in advance, rather than selecting data for analysis post hoc, nothing will.
This is a very simple example, but you can extend the approach to much more complicated analytic methods. It gets challenging in Excel if you want to generate correlated variables, though if you type a correlation coefficient in cell A1, and have a random number in column B, and copy this formula down from cell C2, then columns B and C will be correlated by the value in cell A1:
NB, you won’t get the exact correlation on each run: the precision will increase with the number of rows you simulate.
Other applications, such as Matlab or R, allow you to generate correlated data more easily. There are examples of simulating multivariate normal datasets in R in my blog on twin methods.
Simulation can be used not just for exploring a whole host of issues around statistical methods. For instance, you can simulate data to see how sample size affects results, or how results change if you fail to meet assumptions of a method. But overall, my message is that data simulation is a simple and informative approach to gaining understanding of statistical analysis. It should be used much more widely in training students.

Good, P. I., & Hardin, J. W. (2003). Common errors in statistics (and how to avoid them). Hoboken, NJ: Wiley.