Sunday, 17 May 2015

Will traditional science journals disappear?


The Royal Society has been celebrating the 350th anniversary of Philosophical Transactions, the world's first scientific journal, by holding a series of meetings on the future of scholarly scientific publishing. I followed the whole event on social media, and was able to attend in person for one day. One of the sessions followed a Dragon's Den format, with speakers having 100 seconds to convince three dragons – Onora O'Neill, Ben Goldacre and Anita de Waard – of the fund-worthiness of a new idea for science communication. Most were light-hearted, and there was a general mood of merriment, but the session got me thinking about what kind of future I would like to see. What I came up with was radically different from our current publishing model.

Most of the components of my dream system are not new, but I've combined them into a format that I think could work. The overall idea had its origins in a blogpost I wrote in 2011, and has points in common with David Colquhoun's submission to the dragons, in that it would adopt a web-based platform run by scientists themselves. This is what already happens with the arXiv for the physical sciences and bioRxiv for biological sciences. However, my 'consensual communication' model has some important differences. Here's the steps I envisage an author going through:
1.  An initial protocol is uploaded before a study is done, consisting only of introduction, and a detailed methods section and analysis plan, with the authors anonymised. An editor then assigns reviewers to evaluate it. This aspect of the model draws on features of registered reports, as implemented in the neuroscience journal, Cortex.  There are two key scientific advantages to this approach; first, reviewers are able to improve the research design, rather than criticise studies after they have been done. Second, there is a record of what the research plan was, which can then be compared to what was actually done. This does not confine the researcher to the plan, but it does make transparent the difference between planned and exploratory analyses.
2. The authors get a chance to revise the protocol in response to the reviews, and the editor judges whether the study is of an adequate standard, and if necessary solicits another round of review. When there is agreement that the study is as good as it can get, the protocol is posted as a preprint on the web, together with the non-anonymised peer reviews. At this point the identity of authors is revealed.
3. There are then two optional extra stages that could be incorporated:
a) The researcher can solicit collaborators for the study. This addresses two issues raised at the Royal Society meeting – first, many studies are underpowered; duplicating a study across several centres could help in cases where there are logistic problems in getting adequate sample sizes to give a clear answer to a research question. Second, collaborative working generally enhances reproducibility of findings.
b)  It would make sense for funding, if required, to be solicited at this point – in contrast to the current system where funders evaluate proposals that are often only sketchily described. Although funders currently review grant proposals, there is seldom any opportunity to incorporate their feedback – indeed, very often a single critical comment can kill a proposal.
4. The study is then completed, written up in full, and reviewed by the editor. Provided the authors have followed the protocol, no further review is required. The final version is deposited with the original preprint, together with the data, materials and analysis scripts.
5. Post-publication discussion of the study is then encouraged by enabling comments.
What might a panel of dragons make of this? I anticipate several questions.
Who would pay for it? Well, if arXiv is anything to go by, costs of this kind of operation are modest compared with conventional publishing. They would consist of maintaining the web-based platform, and covering the costs of editors. The open access journal PeerJ has developed an efficient e-publishing operation and charges $99 per author per submission. I anticipate a similar charge to authors would be sufficient to cover costs.
Wouldn't this give an incentive to researchers to submit poorly thought-through studies? There are two answers to that. First, half of the publication charge to authors would be required at the point of initial submission. Although this would not be large (e.g. £50) it should be high enough to deter frivolous or careless submissions. Second, because the complete trail of a submission, from pre-print to final report, would be public, there would be an incentive to preserve a reputation for competence by not submitting sloppy work.
Who would agree to be a reviewer under such a model? Why would anyone want to put their skills in to improving someone else's work for no reward? I propose there could be several incentives for reviewers. First, it would be more rewarding to provide comments that improve the science, rather than just criticising what has already been done. Second, as a more concrete reward, reviewers could have submission fees waived for their own papers. Third, reviews would be public and non-anonymised, and so the reviewer's contribution to a study would be apparent. Finally, and most radically, where the editor judges that a reviewer had made a substantial intellectual contribution to a study, then they could have the option of having this recognised in authorship.
Why would anyone who wasn't a troll want to comment post-publication? We can get some insights into how to optimise comments from the model of the NIH-funded platform PubMed Commons. They do not allow anonymous comments, and require that commenters have themselves authored a paper that is listed on PubMed.  Commenters could also be offered incentives such as a reduction of submission costs to the platform.  To this one could add ideas from commercial platforms such as e-Bay, where sellers are rated by customers, so you can evaluate their reputation. It should be possible to devise some kind of star rating – both for the paper being commented on, and for the person making the comment. This could provide motivation for good commenters and make it easier to identify the high quality papers and comments.
I'm sure that any dragon from the publishing world would swallow me up in flames for these suggestions, as I am in effect suggesting a model that would take commercial publishers out of the loop. However, it seems worth serious consideration, given the enormous sums that could be saved by universities and funders by going it alone.  But the benefits would not just be financial; I think we could greatly improve science by changing the point in the research process when reviewer input occurs, and by fostering a more open and collaborative style of publishing.


This article was first published on the Guardian Science Headquarters blog on 12 May 2015

Monday, 4 May 2015

Great Expectations: Our early assessments of schoolchildren are misleading and damaging



The Early Years Foundation Stage Profile was developed by the government's Standards and Testing Agency "to support practitioners in making accurate judgements about each child's attainment". More specifically:
The EYFS Profile summarises and describes children’s attainment at the end of the EYFS. It is based on ongoing observation and assessment in the three prime and four specific areas of learning, and the three characteristics of effective learning,
• Prime areas: communication and language; physical development; personal, social and emotional development
• Specific areas:  literacy; mathematics; understanding the world; expressive arts; and design of effective learning
• Characteristics: playing and exploring;  active learning;  creating and thinking critically
for each ELG, practitioners must judge whether a child is meeting the level of development expected at the end of the Reception year (expected), exceeding this level (exceeding), or not yet reaching this level (emerging).
The manual gives concrete examples of the kinds of behaviour that meet the expected level for a given Early Learning Goal. For instance:
Understanding: Children follow instructions involving several ideas or actions. They answer ‘how’ and ‘why’ questions about their experiences and in response to stories or events.
Speaking: Children express themselves effectively, showing awareness of listeners’ needs. They use past, present and future forms accurately when talking about events that have happened or are to happen in the future. They develop their own narratives and explanations by connecting ideas or events.
Strikingly absent from these descriptions is any allowance for the child's age. The timing of the assessment is specified to occur when children are aged from 4 yrs 10 months to 5 yr 9 months.
Children's language skills (and indeed other skills) develop rapidly in the preschool and early school years.  I first became aware of this many years ago when I was developing a children's comprehension assessment (TROG). The goal was to establish the typical range of performance at different ages and subsequently use TROG to identify cases of poor comprehension in clinical settings. The assessment involved showing children sets of four pictures and asking them to point to the one that matched a spoken phrase or sentence.  I knew very little about developmental psychology at the time, so I just decided to try the materials with children of different ages to see how they reacted. It soon became apparent that there were substantial age-related changes, and I realised that if I would need to use four age-bands for 4-year-olds and two age-bands for 5-year-olds. Some illustrative data are shown in Figure 1.


Figure 1: Percentage children getting 4/4 items correct on blocks testing specific constructions. 
From the original Test for Reception of Grammar (1983).

Findings like this are not specific to this test. I've developed several language assessments over the years and I've used those developed by others: they all show rapid change from 4 to 6 years.
Concerned by this, I wrote for information to the government's Children and Early Years Data Unit, who referred me to this report.  This gives percentages of children reaching a Good Level of Development, defined as achieving "at least the expected level in the early learning goals in the prime areas of learning (personal, social and emotional development; physical development; and communication and language) and in the specific areas of mathematics and literacy." A Good Level of Development was obtained by 69% of autumn-born children, 59% of spring-born children and 47% of summer-born children, confirming that the standards used to evaluate children are sensitive to age.
This is seriously problematic for at least reasons. First, it means we are using flawed assessments that will over-identify problems in younger children. It is already established that in the USA attentional deficits are over-diagnosed in summer-born children (Elder, 2010) – a problem that has long-term consequences when children are subsequently prescribed medication for what may actually normal behaviour in an immature child. Making children feel that they are falling short of an expected standard before they are 5 years old cannot be good for their development. In this regard it is noteworthy that there is evidence that being summer-born continues to be associated with educational disadvantage in English children through the later school years (Crawford et al, 2013).
A second problem is that use of inappropriate criteria for 'expected' levels of development will give a false impression of the numbers of children with developmental difficulties. Consider this article describing an 'early learning crisis' with '20 percent of children unable to communicate properly at age 5'. I have a particular interest in children who have language difficulties, but nobody is helped by over-identifying problems in children who are just the youngest in their class. I've seen enough 4 and 5-year-olds to know that the 'early learning goals' for understanding and speaking are not realistic 'expectations' for 4-year-olds and for those who have only just turned 5 years. Indeed, the fact that one third of the oldest children are not regarded as having a good level of development suggests to me that the expectations are inappropriately high even for the oldest 5-year-olds.
My colleague Courtenay Norbury, Professor in the Psychology Dept at Royal Holloway, will shortly be publishing data from a large survey of language development in reception class children in Surrey*. She tells me that month of birth is once again emerging as an important factor.
I'm not someone who is opposed to assessment in principle, but if you are going to do it, it's important to do it in an informed manner. Surely it is time for the policy-makers in this area to recognise that their current practices of early assessment are misleading, and have the potential to cause damage when children are evaluated against standards that are overly stringent and do not take age into account.


*Update 5th June 2015: This is now published as an open access 'early view' paper in Journal of Child Psychology and Psychiatry: http://onlinelibrary.wiley.com/doi/10.1111/jcpp.12431/abstract