Saturday, 13 October 2018

Working memories: a brief review of Alan Baddeley's memoir

This post was prompted by Tom Hartley, who asked if I would be willing to feature an interview with Alan Baddeley on my blog.  This was excellent timing, as I'd just received a copy of Working Memories from Alan, and had planned to take it on holiday with me. It proved to be a fascinating read. Tom's interview, which you can find here, gives a taster of the content.

The book was of particular interest to me, as Alan played a big role in my career by appointing me to a post I held at the MRC Applied Psychology Unit (APU) from 1991 to 1998, and so I'm familiar with many of the characters and the ideas that he talks about in the book. His work covered a huge range of topics and collaborations, and the book, written at the age of 84, works both as a history of cognitive psychology and as a scientific autobiography.

Younger readers may be encouraged to hear that Alan's early attempts at a career were not very successful, and his career took off only after a harrowing period as a hospital porter and a schoolteacher, followed by a post at the Burden Neurological Institute, studying the effects of alcohol, where his funds were abruptly cut off because of a dispute between his boss and another senior figure. He was relieved to be offered a place at the MRC Applied Psychology Unit (APU) in Cambridge, eventually doing a doctorate there under the supervision of Conrad (whose life I wrote about here), experimenting on memory skills in sailors and postmen.

I had known that Alan's work covered a wide range of areas, but was still surprised to find just how broad his interests were. In particular, I was aware he had done work on memory in divers, but had thought that was just a minor aspect of his interests. That was quite wrong: this was Alan's main research interest over a period of years, where he did a series of studies to determine how far factors like cold, anxiety and air quality during deep dives affected reasoning and memory: questions of considerable interest to the Royal Navy among others.

After periods working at the Universities of Sussex and Stirling, Alan was appointed in 1974 as Director of the MRC APU, where he had a long and distinguished career until his formal retirement in 1995. Under his direction, the Unit flourished, pursuing a much wider range of research, with strong external links. Alan enjoyed working with others, and had collaborations around the world.  After leaving Cambridge,  he took up a research chair at the University of Bristol, before settling at the University of York, where he is currently based.

I was particularly interested in Alan's thoughts on applied versus theoretical research. The original  APU was a kind of institution that I think no longer exists: the staff were expected to apply their research skills to address questions that outside agencies, especially government, were concerned with. The earliest work was focused on topics of importance during wartime: e.g., how could vigilance be maintained by radar operators, who had the tedious task of monitoring a screen for rare but important events. Subsequently, unit staff were concerned with issues affecting efficiency of government operations during peacetime: how could postcodes be designed to be memorable? Was it safe to use a mobile phone while driving? Did lead affect children's cognitive development?  These days, applied problems are often seen as relatively pedestrian, but it is clear that if you take highly intelligent researchers with good experimental skills and pose them this kind of challenge, the work that ensues will not only answer the question, but may also lead to broader theoretical insights.

Although Alan's research included some work with neurological patients, he would definitely call himself a cognitive psychologist, and not a neuroscientist. He notes that his initial enthusiasm for functional brain imaging died down after finding that effects of interest were seldom clearcut and often failed to replicate. His own experimental approaches to evaluate aspects of memory and cognition seemed to throw more light than neuroimaging on deficits experienced by patients.

The book is strongly recommended for anyone interested in the history of psychology. As with all of Alan's writing, it is immensely readable because of his practice of writing books by dictation as he goes on long country walks: this makes for a direct and engaging style. His reflections on the 'cognitive revolution' and its impact on psychology are highly relevant for today's psychologists. As Alan says in the interview "... It's important to know where our ideas come from. It's all too tempting to think that whatever happened in the last two or three years is the cutting edge and that's all you need to know. In fact, it's probably the crest of a breaking wave and what you need to know is where that wave came from."

Saturday, 15 September 2018

An index of neighbourhood advantage from English postcode data


Screenshot from http://dclgapps.communities.gov.uk/imd/idmap.html
Densely packed postcodes appear grey: you need to expand the map to see colours
-->
The Ministry of Housing, Communities and Local Government has a website which provides an ‘index of multiple deprivation’ for every postcode in England.  This is a composite index based on typical income, employment, education, health, crime, housing and living environment for each of 32,844 postcodes in 2015. You can also extract indices for the component factors that contribute to the index, which are explained further here. And there is a fascinating interactive website where you can explore the indices on a map of England.

Researchers have used the index of multiple deprivation as an overall measure of environmental factors that might affect child development, but it has one major drawback. The number that the website gives you is a rank from 1 to 32,844. This means it is not normally distributed, and not easy to interpret. You are also given decile bands, but these are just less precise versions of the ranks – and like ranks, have a rectangular, rather than a normal distribution (with each band containing 10% of the postcodes). If you want to read more about why rectangularly distributed data are problematic, see this earlier blogpost.

I wanted to use this index, but felt it would make sense to convert the ranks into z-scores. This is easily done, as z-scores are simply rescaled proportions. Here’s what you do:

Use the website to convert the postcode to an index of deprivation: in fact, it’s easiest to paste in a list of postcodes and you then get a set of indices for each one, which you can download either as .csv or .xlsx file. The index of multiple deprivation is given in the fifth column.

To illustrate, I put in the street address where I grew up, IG38NP, which corresponds to a multiple deprivation index of 12596.

In Excel, you can just divide the multiple deprivation index by 32844, to get a value of .3835, which you can then convert to a z-score using the NORMSINV function. Or, to do this in one step, if you have your index of multiple deprivation in cell A2, you type
 =normsinv(A2/32844)

This gives a value of -0.296, which is the corresponding z-score. I suggest calling it the ‘neighbourhood advantage score’ – so it’s clear that a high score is good and a low score is bad.

If you are working in R, you can just use the command:
neighbz = qnorm(deprivation_index/depmax)
where neighbz is the neighbourhood advantage score,  depmax has been assigned to 32844 and deprivation_index is the index of multiple deprivation.

Obviously, I’ve presented simplified commands here, but in either Excel or R it is easy to convert a whole set of postcodes in one go.

It is, of course, important to keep in mind that this is a measure of the neighbourhood a person lives in, and not of the characteristics of the individual. Postcode indicators may be misleading in mixed neighbourhoods, e.g. where gentrification has occurred, so rich and poor live side by side. And the different factors contributing to the index may be dissociated. Nevertheless, I think this index can be useful for providing an indication of whether a sample of individuals is representative of the population of England. In psychology studies, volunteers tend to come from more advantaged backgrounds, and this provides one way to quantify this effect.

Sunday, 26 August 2018

Should editors edit reviewers?


How Einstein dealt with peer review: from http://theconversation.com/hate-the-peer-review-process-einstein-did-too-27405

This all started with a tweet from Jesse Shapiro under the #shareyourrejections hashtag:

JS: Reviewer 2: “The best thing these authors [me and @ejalm] could do to benefit this field of study would be to leave the field and never work on this topic again.” Paraphrasing only slightly.

This was quickly followed by another example;
Bill Hanage: #ShareYourRejections “this paper is not suitable for publication in PNAS, or indeed anywhere.”

Now, both of these are similarly damning, but there is an important difference. The first one criticises the authors, the second one criticises the paper. Several people replied to Jesse’s tweet with sympathy, for instance:

Jenny Rohn: My condolences. But Reviewer 2 is shooting him/herself in the foot - most sensible editors will take a referee's opinion less seriously if it's laced with ad hominem attacks.

I took a different tack, though:
DB: A good editor would not relay that comment to the author, and would write to the reviewer to tell them it is inappropriate. I remember doing that when I was an editor - not often, thankfully. And reviewer apologised.

This started an interesting discussion on Twitter:

Ben Jones: I handled papers where a reviewer was similarly vitriolic and ad hominem. I indicated to the reviewer and authors that I thought it was very inappropriate and unprofessional. I’ve always been very reluctant to censor reviewer comments, but maybe should reconsider that view

DB: You're the editor. I think it's entirely appropriate to protect authors from ad hominem and spiteful attacks. As well as preventing unnecessary pain to authors, it helps avoid damage to the reputation of your journal

Chris Chambers: Editing reviews is dangerous ground imo. In this situation, if the remainder of the review contained useful content, I'd either leave the review intact but inform the authors to disregard the ad hom (& separately I'd tell reviewer it's not on) or dump the whole review.

DB: I would inform reviewer, but I don’t think it is part of editor’s job to relay abuse to people, esp. if they are already dealing with pain of rejection.

CC: IMO this sets a dangerous precedent for editing out content that the editor might dislike. I'd prefer to keep reviews unbiased by editorial input or drop them entirely if they're junk. Also, an offensive remark or tone could in some cases be embedded w/i a valid scientific point.

Kate Jeffery: I agree that editing reviewer comments without permission is dodgy but also agree that inappropriate comments should not be passed back to authors. A simple solution is for editor to revise the offending sentence(s) and ask reviewer to approve change. I doubt many would decline.

A middle road was offered by Lisa deBruine:
LdB: My solution is to contact the reviewer if I think something is wrong with their review (in either factual content or professional tone) and ask them to remove or rephrase it before I send it to the authors. I’ve never had one decline (but it doesn’t happen very often).

I was really surprised by how many people felt strongly that the reviewer’s report was in some sense sacrosanct and could and should not be altered. I’ve pondered this further, but am not swayed by the arguments.

I feel strongly that editors should be able to distinguish personal abuse from robust critical comment, and that, far from being inappropriate, it is their duty to remove the former from reviewer reports. And as for Chris’s comment: ‘an offensive remark or tone could in some cases be embedded w/i a valid scientific point’ – the answer is simple. You rewrite to remove the offensive remark; e.g. ‘The authors’ seem clueless about the appropriate way to run a multilevel model’ could be rewritten to ‘The authors should take advice from a statistician about their multilevel model, which is not properly specified’. And to be absolutely clear, I am not talking about editing out comments that are critical of the science, or which the editor happens to disagree with. If a reviewer got something just plain wrong, I’m okay with giving a clear steer in the editor’s letter, e.g.: ‘Reviewer A suggests you include age as a covariate. I notice you have already done that in the analysis on p x, so please ignore that comment.’ I am specifically addressing comments that are made about the authors rather than the content of what they have written. A good editor should find that an easy distinction to make. From the perspective of an author, being called out for getting something wrong is never comfortable: being told you are a useless person because you got something wrong just adds unnecessary pain.

Why do I care about this? It’s not just because I think we should all be kind to each other (though, in general, I think that’s a good idea). There’s a deeper issue at stake here. As editors, we should work to reinforce the idea that personal disputes should have no place in science. Yes, we are all human beings, and often respond with strong emotions to the work of others. I can get infuriated when I review a paper where the authors appear to have been sloppy or stupid. But we all make mistakes, and are good at deluding ourselves. One of the problems when you start out is that you don’t know what you don’t know: I learned a lot from having my errors pointed out by reviewers, but I was far more likely to learn from this process if the reviewer did not adopt a contemptuous attitude. So, as reviewers, we should calm down and self-edit, and not put ad hominem comments in our reviews. Editors can play a role in training reviewers in this respect.

For those who feel uncomfortable with my approach - i.e. edit the review and tell reviewer why you have done so – I would recommend Lisa de Bruine’s solution of raising the issue with the reviewer and asking them to amend their review. Indeed, in today’s world where everything is handled by automated systems, that may be the only way of ensuring that an insulting review does not go to the author (assuming the automated system lets you do that!).

Finally, as everyone agreed that, this this does not seem to be a common problem, so perhaps not worth devoting much space to, but I'm curious to know how other editors respond to this issue.

Monday, 20 August 2018

Matlab vs open source: Costs and benefits to scientists and society

An interesting twitter thread came along yesterday, started by this query from Jan Wessel (@wessel_lab):

Quick thread of (honest) questions for the numerous people on here that subscribe to the position that sharing code in MATLAB ($) is bad open-science practice compared to open source languages (e.g., Python). What should I do as a PI that runs a lab whose entire coding structure is based (publicly shared) MATLAB code? Some say I should learn an open-source language and change my lab’s procedures over to it. But how would that work in practice? 

When I resort to blogging, it’s often because someone has raised a question that has captured my interest because it does not have a simple answer. I have made a Twitter moment to store the rest of Jan’s thread and some of the responses to it, as they raise important points which have broad application.

In part, this is an argument about costs and benefits to the individual scientist and the community. Sometimes these can be aligned, but in this case, they is some conflict, because those who can’t afford Matlab would not be able to run Jan’s code. If he were to move to Python, then anyone would be able to do so.

His argument is that he has invested a lot of time in learning Matlab, has a good understanding of how Matlab code works, and feels competent to advise his trainees in it. Furthermore, he works in the field of EEG, where there are whole packages developed to do the complex analysis involved, and Matlab is the default in this field. So moving to another programming language would not only be a big time sink, but would also make him out of step with the rest of the field.

There was a fair bit of division of opinion in the replies. On the one hand, there were those who thought this was a non-issue. It was far more important to share code than to worry about whether it was written in a proprietary language. And indeed, if you are well-enough supported to be doing EEG research, then it’s likely your lab can afford the licensing costs.

I agree with the first premise: just having the code available can be helpful in understanding how an analysis was done, even if you can’t run it. And certainly, most of those in EEG research are using Matlab. However, I’m also aware that for those in resource-limited countries, EEG is a relatively cheap technology for doing cognitive neuroscience, so I guess there will be those who would be able to get EEG equipment, but for whom the Matlab licensing costs are prohibitive.

But the replies emphasised another point: the landscape is continually changing. People have been encouraging me to learn Python, and I’m resisting only because I’m starting to feel too old to learn yet another programming language. But over the years, I’ve had to learn Basic, Matlab and R, as well as some arcane stuff for generating auditory stimuli whose name I can’t even remember. But I’ve looked at Jan’s photo on the web, and he looks pretty young, so he doesn’t have my excuse. So on that basis, I’d agree with those advising to consider making a switch. Not just to be a good open scientist, but in his own interests, which involves keeping up to date. As some on the thread noted, many undergrads are now getting training in Python or R, and sooner or later open source will become the default.

In the replies there were some helpful suggestions from people who were encouraging Jan to move to open source but in the least painful way possible. And there was reassurance that there are huge savings in learning a new language: it’s really not like going back to square one. That’s my experience: in fact, my knowledge of Basic was surprisingly useful when learning Matlab.

So the bottom line seems to be, don’t beat yourself up about it. Posting Matlab code is far better than not posting any code. But be aware that things are changing, and sooner than later, you’ll need to adapt. The time costs of learning a new language may prove trivial in the long term, against the costs of being out of date. But I can state with total confidence that learning Python will not be the end of it: give it a few years and something else will come along.

When I was first embarking on an academic career, I remember looking at the people who were teaching me, who, at the age of around 40, looked very old indeed. And I thought it must be nice for them, because they have worked hard, learned stuff, and now they know it all and can just do research and teach. When I got to 40, I had the awful realisation that the field was changing so fast, that unless I kept learning new stuff, I would get left behind. And it hasn't stopped over the past 25 years!

Saturday, 11 August 2018

More haste less speed in calls for grant proposals


Helpful advice from the World Bank

This blogpost was prompted by a funding call announced this week by the Economic and Social Research Council (ESRC)  , which included the following key dates:
  • Opening date for proposals – 6 August 2018 
  • Closing date for proposals – 18 September 2018 
  • PI response invited – 23 October 2018 
  • PI response due – 29 October 2018 
  • Panel – 3 December 2018 
  • Grants start – 14 February 2019 
As pointed out by Adam Golberg (@cash4questions), Research Development Manager at Nottingham University, on Twitter, this is very short notice to prepare an application for substantial funding:
I make this about 30 working days notice. For a call issued in August. For projects of 36 months, up to £900k - substantial, for social sciences. With only one bid allowed to be led from each institution, so likely requiring an internal sift. 

I thought it worth raising this with ESRC, and they replied promptly, saying:
To access funds for this call we’ve had to adhere to a very tight spending timeframe. We’ve had to balance the call opening time with a robust peer review process and a Feb 2019 project start. We know this is a challenge, but it was a now or never funding opportunity for us.
 
They suggested I email them for more information, and I’ve done that, so will update this post if I hear more. I’m particularly curious about what is the reason for the tight spending timeframe and the inflexible February 2019 start.

This exchange led to discussion on Twitter which I have gathered together here.

It’s clear that from the responses that this kind of time-frame is not unusual, and I have been sent some other examples. For instance this ESRC Leadership Fellowship (£100,000 for 12 months) had a call for proposals issued on 16th November 2017, with a deadline for submissions of 3 January. When you factor in that most universities shut down from late December until early January, and so this would need to be with administrators before the Christmas break, this gives applicants around 30 days to construct a competitive proposal. But it’s not only ESRC that does this, and I am less interested in pointing the finger at a particular funder – who may well be working under pressures outside their control - than just raising the issue of why this needs a rethink. I see five problems with these short lead times:

1. Poorer quality of proposals 
The most obvious problem is that a hastily written proposal is likely to be weaker than one that is given more detailed consideration. The only good thing you might say about the time pressure is that it is likely to reduce the number of proposals, which reduces the load on the funder’s administration. It’s not clear, however, whether this is an intended consequence.

2. Stress on academic staff 
There is ample evidence that academic staff in the UK have high stress levels, often linked to a sense of increasing demands and high workload. A good academic shows high attention to detail and is at pains to get things right: research is not something that can be done well under tight time pressure. So holding up the offer of a large grant with only a short time period to prepare a proposal is bound to increase stress: do you drop everything else to focus on grant-writing, or pass by the opportunity to enter the competition?

Where the interval between the funding call and the deadline occurs over a holiday period, some might find this beneficial, as other demands such as teaching are lower. But many people plan to take a vacation, and should be able to have a complete escape from work for at least a week or two. Others will have scheduled the time for preparing lectures, doing research, or writing papers. Having to defer those activities in order to meet a tight deadline just induces more sense of overload and guilt at having a growing backlog of work.

3. Equity issues 
These points about vacations are particularly pertinent for those with children at home during the holidays, as pointed out in a series of tweets by Melissa Terras, Professor of Digital Cultural Heritage at Edinburgh University, who said:
I complained once to the AHRC about a call announced in November with a closing date of early January - giving people the chance to work over the Xmas shutdown on it. I wasn't applying to the call myself, but pointed out that it meant people with - say - school age kids - wouldn't have a "clear" Xmas shutdown to work on it, so it was prejudice against that cohort. They listened, apologised, and extended the deadline for a month, which I was thankful for. But we shouldn't have to explain this to them. Have RCUK done their implicit bias training?

4. Stress on administrative staff 
One person who contacted me via email pointed out that many funders, including ESRC, ask institutions to filter out uncompetitive proposals through internal review. That could mean senior research administrators organising exploratory workshops, soliciting input from potential PIs, having people present their ideas, and considering collaborations with other institutions. None of that will be possible in a 30-day time frame. And for the administrators who do the routine work of checking grants for accuracy of funding bids and compliance with university and funder requirements, I suspect it’s not unusual to be dealing with a stressed researcher who expects them to do all of this with rapid turnaround, but where the funding scheme virtually guarantees everything is done in a rush, this just gets worse.

5. Perception of unfairness 
Adding in to this toxic mix, we have the possibility of diminished trust in the funding process. My own interest in this issues stems from a time a few years ago when there was a funding call for a rather specific project in my area. The call came just before Christmas, with a deadline in mid January. I had a postdoc who was interested in applying, but after discussing it, we decided not to put in a bid. Part of the reason was that we had both planned a bit of time off over Christmas, but in addition I was suspicious about the combination of short time-scale and specific topic. This made me wonder whether a decision had already been made about who to award the funds to, and the exercise was just to fulfil requirements and give an illusion of fairness and transparency.

Responses on Twitter again indicate that others have had similar concerns. For instance, Jon May, Professor in Psychology at the University of Plymouth, wrote:
I suspect these short deadline calls follow ‘sandboxes’ where a favoured person has invited their (i.e his) friends to pitch ideas for the call. Favoured person cannot bid but friends can and have written the call.
 
And an anonymous correspondent on email noted:
I think unfairness (or the perception of unfairness) is really dangerous – a lot of people I talk to either suspect a stitch-up in terms of who gets the money, or an uneven playing field in terms of who knew this was coming.

So what’s the solution? One option would be to insist that, at least for those dispensing public money, there should be a minimum time between a call for proposals and the submission date: about 3 months would seem reasonable to me.

Comments will be open on this post for a limited time (2 months, since we are in holiday season!) so please add your thoughts.

P.S. Just as I was about to upload this blogpost, I was alerted on Twitter to this call from the World Bank, which is a beautiful illustration of point 5 - if you weren't already well aware this was coming, there would be no hope of applying. Apparently, this is not a 'grant' but a 'contract', but the same problems noted above would apply. The website is dated 2nd August, the closing date is 15th August. There is reference to a webinar for applicants dated 9th July, so presumably some information has been previously circulated, but still with a remarkably short time lag, given that there need to be at least two collaborating institutions (including middle- and low-income countries)
, with letters of support from all collaborators and all end users. Oh, and you are advised ‘Please do not wait until the last minute to submit your proposal’.


Update: 17th August 2018
An ESRC spokesperson sent this reply to my query:

Thank you for getting in touch with us with your concerns about the short call opening time for the recently announced Management Practices and Employee Engagement call, and the fact that it has opened in August.

We welcome feedback from our community on the administration of funding programmes, and we will think carefully about how to respond to these concerns as we design and plan future programmes.

To provide some background to this call. It builds on an open-invite scoping workshop we held in February 2018, at which we sought input from the academic, policy and third-sector communities on the shape of a (then) potential research investment on management practices and employee engagement. We subsequently flagged the likelihood of a funding call around the topic area this summer, both at the scoping workshop itself, as well as in our ongoing engagements with the academic community.

We do our best to make sure that calls are open for as long as possible. We have to balance call opening times with a robust and appropriately timetabled peer review process, feasible project start dates, the right safeguards and compliances, and, in certain cases such as this one, a requirement to spend funds within the financial year. 

We take the concerns that you raise in your email and in your blog post of 11 August 2018 extremely seriously. The high standard of the UK's research is a result of the work of our academic community, and we are committed to delivering a system that respects and responds to their needs. As part of this, we are actively looking into ways to build in longer call lead times and/or pre-announcements of funding opportunities for potential future managed calls in this and other areas.

I would also like to stress that applicants can still submit proposals on the topic of management practices and employee engagement through our standard research grant process, which is open all year round. The peer review system and the Grant Assessment Panel does not take into account the fact that a managed call is open on a topic when awarding funding: decisions are taken based on the excellence of the proposal.

Update: 23rd August 2018
A spokesperson for the World Bank has written to note that the grant scheme alluded to in my postscript did in fact have a 2 month period between the call and submission date. I have apologised to them for suggesting it was shorter than this, and also apologise to readers for providing misleading information. The duration still seems short to me for a call of this nature, but my case is clearly not helped by providing wrong information, and I should have taken greater care to check details. Text of the response from the World Bank is below:
 
We noticed with some concern that in your Aug. 11 blog post, you had singled out a World Bank call for proposals as a “beautiful illustration” of a type of funding call that appears designed to favor an inside candidate. This characterization is entirely inaccurate and appears based on a misperception of the time lag between the announcement of the proposal and the deadline.
Your reference to the 2018 Call for Proposals for Collaborative Data Innovations for Sustainable Development by the World Bank and the Global Partnership for Sustainable Development Data as undermining faith in the funding process seems based on the mistaken assumption that the call was issued on or about August 2. It was not.
The call was announced June 19 on the websites of the World Bank and the GPSDD. This was two months before the closing date, a period we have deemed fair to applicants but also appropriate given our own time constraints. An online seminar was offered to assist prospective applicants, as you note, on July 9.
The seminar drew 127 attendees for whom we provided answers to 147 questions. We are still reviewing submissions for the most recent call for proposals for this project, but our call for the 2017 version elicited 228 proposals, of which 195 met criteria for external review.
As the response to the seminar and the record of submissions indicate, this funding call has been widely seen and provided numerous applicants the opportunity to respond.  To suggest that this has not been an open and fair process does not do it justice.

Here are the links with the announcement dates of June 19th

Friday, 20 July 2018

Standing on the shoulders of giants, or slithering around on jellyfish: Why reviews need to be systematic

Yesterday I had the pleasure of hearing George Davey Smith (aka @mendel_random) talk. In the course of a wide-ranging lecture he recounted his experiences with conducting a systematic review. This caught my interest, as I’d recently considered the question of literature reviews when writing about fallibility in science. George’s talk confirmed my concerns that cherry-picking of evidence can be a massive problem for many fields of science.

Together with Mark Petticrew, George had reviewed the evidence on the impact of stress and social hierarchies on coronary artery disease in non-human primates. They found 14 studies on the topic, and revealed a striking mismatch between how the literature was cited and what it actually showed. Studies in this area are of interest to those attempting to explain the well-known socioeconomic gradient in health. It’s hard to unpack this in humans, because there are so many correlated characteristics that could potentially explain the association. The primate work has been cited to support psychosocial accounts of the link; i.e., the idea that socioeconomic influences on health operate primarily through psychological and social mechanisms. Demonstration of such an impact in primates is  particularly convincing, because stress and social status can be experimentally manipulated in a way that is not feasible in humans.

The conclusion from the review was stark: ‘Overall, non-human primate studies present only limited evidence for an association between social status and coronary artery disease. Despite this, there is selective citation of individual non-human primate studies in reviews and commentaries relating to human disease aetiology’(p. e27937).

The relatively bland account in the written paper belies the stress that George and his colleague went through in doing this work. Before I tried doing one myself, I thought that a systematic review was a fairly easy and humdrum exercise. It could be if the literature were not so unruly. In practice, however, you not only have to find and synthesise the relevant evidence, but also to read and re-read papers to work out what exactly was done. Often, it’s not just a case of computing an effect size: finding the numbers that match the reported result can be challenging. One paper in the review that was particularly highly-cited in the epidemiology literature turned out to have data that were problematic: the raw data shown in scattergraphs are hard to reconcile with the adjusted means reported in a summary (see Figure below). Correspondence sent to the author apparently did not achieve a reply, let alone an explanation.

Figure 2 from Shively and Thompson (1994) Arteriosclerosis and Thrombosis Vol 14, No 5. Yellow bar added to show mean plaque areas as reported in Figure 3 (adjusted for preexperimental thigh circumference and TPC-HDL cholesterol ratio)
Even if there were no concerns about the discrepant means, the small sample size and influential outliers in this study should temper any conclusions. But those using this evidence to draw conclusions about human health focused on the ‘five-fold increase’ in coronary disease in dominant animals who became subordinate.

So what impact has the systematic review achieved? Well, the first point to note is that the authors had a great deal of difficulty getting it accepted for publication: it would be sent to reviewers who worked on stress in monkeys, and they would recommend rejection. This went on for some years: the abstract was first published in 2003, but the full paper did not appear until 2012.

The second, disappointing conclusion comes from looking at citations of the original studies reviewed by Petticrew and Davey Smith in the human health literature since their review appeared. The systematic review garnered 4 citations in the period 2013-2015 and just one during 2016-2018. The mean citations for the 14 articles covered in their meta-analysis was 2.36 for 2013-2015, and 3.00 for 2016-2018. The article that was the source of the Figure above had six citations in the human health literature in 2013-2015 and four in 2016-2018. These numbers aren’t sufficient for more than impressionistic interpretation, and I only did a superficial trawl through abstracts of citing papers, so I am not in a position to determine if all of these articles accepted the study authors’ conclusions. However, the pattern of citations fits with past experience in other fields showing that when cherry-picked facts fit a nice story, they will continue to be cited, without regard to subsequent corrections,  criticism or even retraction.

The reason why this worries me is that the stark conclusion would appear to be that we can’t trust citations of the research literature unless they are based on well-conducted systematic reviews. Iain Chalmers has been saying this for years, and in his field of clinical trials these are more common than in other disciplines. But there are still many fields where it is seen as entirely appropriate to write an introduction to a paper that only cites supportive evidence and ignores a swathe of literature that shows null or opposite results. Most postgraduates have an initial thesis chapter that reviews the literature, but it's rare, at least in psychology, to see a systematic review - perhaps because this is so time-consuming and can be soul-destroying. But if we continue to cherry-pick evidence that suits us, then we are not so much standing on the shoulders of giants as slithering around on jellyfish, and science will not progress.

Thursday, 12 July 2018

One big study or two small studies? Insights from simulations

At a recent conference, someone posed a question that had been intriguing me for a while: suppose you have limited resources, with the potential to test N participants. Would it be better to do two studies, each with N/2 participants, or one big study with all N?

I've been on the periphery of conversations about this topic, but never really delved into it, so I gave a rather lame answer. I remembered hearing that statisticians would recommend the one big study option, but my intuition was that I'd trust a result that replicated more than one which was a one-off, even if the latter was from a bigger sample. Well, I've done the simulations and it's clear that my intuition is badly flawed.

Here's what I did. I adapted a script that is described in my recent slides that give hands-on instructions for beginners on how to simulate data, The script, Simulation_2_vs_1_study_b.R, which can be found here, generates data for a simple two-group comparison using a t-test. In this version, on each run of the simulation, you get output for one study where all subjects are divided into two groups of size N, and for two smaller studies each with half the number of subjects. I ran it with various settings to vary both the sample size and the effect size (Cohen's d). I included the case where there is no real difference between groups (d = 0), so I could estimate the false positive rate as well as the power to detect a true effect.

I used a one-tailed t-test, as I had pre-specified that group B had the higher mean when d > 0. I used a traditional approach with p-value cutoffs for statistical significance (and yes, I can hear many readers tut-tutting, but this is useful for this demonstration….) to see how often I got a result that met each of three different criteria:
  • a) Single study, p < .05 
  • b) Split sample, p < .05 replicated in both studies 
  • c) Single study, p < .005

Figure 1 summarises the results.
Figure 1


The figure is pretty busy but worth taking a while to unpack. Power is just the proportion of runs of the simulation where the significance criterion was met. It's conventional to adopt a power cutoff of .8 when deciding on how big a sample to use in a study. Sample size is colour coded, and refers to the number of subjects per group for the single study. So for the split replication, each group has half this number of subjects. The continuous line shows the proportion of results where p < .05 for the single study, the dotted line has results from the split replication, and the dashed line has results from the single study with more stringent significance criterion, p < .005 .

It's clear that for all sample sizes and all effect sizes, the one single sample is much better powered than the split replication.

But I then realised what had been bugging me and why my intuition was different. Look at the bottom left of the figure, where the x-axis is zero: the continuous lines (i.e., big sample, p < .05) all cross the y-axis at .05. This is inevitable: by definition, if you set p < .05, there's a one in 20 chance that you'll get a significant result when there's really no group difference in the population, regardless of the sample size. In contrast, the dotted lines cross the y-axis close to zero, reflecting the fact that when the null hypothesis is true, the chance of two samples both giving p < .05 in a replication study is one in 400 (.05^2 = .0025). So I had been thinking more like a Bayesian: given a significant result, how likely was it to have been come from a population with a true effect rather than a null effect? This is a very different thing from what a simple p-value tells you*.

Initially, I thought I was onto something. If we just stick with p < .05, then it could be argued that from a Bayesian perspective, the split replication approach is preferable. Although you are less likely to see a significant effect with this approach, when you do, you can be far more confident it is a real effect. In formal terms, the likelihood ratio for a true vs null hypothesis, given p < .05, will be much higher for the replication.

My joy at having my insight confirmed was, however, short-lived. I realised that this benefit of the replication approach could be exceeded with the single big sample simply by reducing the p-value so that the odds of a false positive are minimal. That's why Figure 1 also shows the scenario for one big sample with p < .005: a threshold that has recently proposed as a general recommendation for claims of new discoveries (Benjamin et al, 2018)**.

None of this will surprise expert statisticians: Figure 1 just reflects basic facts about statistical power that were popularised by Jacob Cohen in 1977. But I'm glad to have my intuitions now more aligned with reality, and I'd encourage others to try simulation as a great way to get more insights into statistical methods.

Here is the conclusions I've drawn from the simulation:
  • First, even when the two groups come from populations with different means, it's unlikely that you'll get a clear result from a single small study unless the effect size is at least moderate; and the odds of finding a replicated significant effect are substantially lower than this.  None of the dotted lines achieves 80% power for a replication if effect size is less than .3 - and many effects in psychology are no bigger than that. 
  • Second, from a statistical perspective, testing an a priori hypothesis in a larger sample with a lower p-value is more efficient than subdividing the sample and replicating the study using a less stringent p-value.
I'm not a stats expert, and I'm aware that there's been considerable debate out there about p-values - especially regarding the recommendations of Benjamin et al (2018). I have previously sat on the fence as I've not felt confident about the pros and cons. But on the basis of this simulation, I'm warming to the idea of p < .005. I'd welcome comments and corrections.

*In his paper The reproducibility of research and the misinterpretation of p-values. Royal Society Open Science, 4(171085). doi:10.1098/rsos.171085 David Colquhoun (2017) discusses these issues and notes that we also need to consider the prior likelihood of the null hypothesis being true: something that is unknowable and can only be estimated on the basis of past experience and intuition.
**The proposal for adopting p < .005 as a more stringent statistical threshold for new discoveries can be found here: Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., . . . Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6-10. doi:10.1038/s41562-017-0189-z


Postscript, 15th July 2018


This blogpost has generated a lot of discussion, mostly on Twitter. One point that particularly interested me was a comment that I hadn’t done a fair comparison between the one-study and two-study situation, because the plot showed a one-off two group study with an alpha at .005, versus a replication study (half sample size in each group) with alpha at .05. For a fair comparison, it was argued, I should equate the probabilities between the two situations, i.e. the alpha for the one-off study should be .05 squared = .0025.

So I took a look at the fair comparison: Figure 2 shows the situation when comparing one study with alpha set to .0025 vs a split replication with alpha of .05. The intuition of many people on Twitter was that these should be identical, but they aren’t. Why not? We have the same information in the two samples. (In fact, I modified the script so that this was literally true and the same sample was tested singly and again split into two – previously I’d just resampled to get the smaller samples. This makes no difference – the single sample with more extreme alpha still gives higher power).

Figure 2: Power for one-off study with alpha .0025 (dashed lines) vs. split replication with p < .05
To look at it another way, in one version of the simulation there were 1600 simulated experiments with a true effect (including all the simulated sample sizes and effect sizes). Of these 581 were identified as ‘significant’ both by the one-off study with p < .0025 and they were also replicated in two small studies with p < .05. Only 5 were identified by the split replication alone, but 134 were identified by the one-off study alone.

I think I worked out why this is the case, though I’d appreciate having a proper statistical opinion. It seems to have to do with accuracy of estimating the standard deviation. If you have a split sample and you estimate the mean from each half (A and B), then the average of mean A and mean B will be the same as for the big sample of AB combined. But when it comes to estimating the standard deviation – which is a key statistic when computing group differences – the estimate is more accurate and precise with the large sample. This is because the standard deviation is computed by measuring the difference of each value from its own sample mean. Means for A and B will fluctuate due to sampling error, and this will make the estimated SDs less reliable. You can estimate the pooled standard deviation for two samples by taking the square root of the average of the variances. However, that value is less precise than the SD from the single large sample. I haven’t done a large number of runs, but a quick check suggests that whereas both the one-off study and the split replication give pooled estimates of the SD at around the true value of 1.0, the standard deviation of the standard deviation (we are getting very meta here!) is around .01 for the one-off study but .14 for the split replication. Again, I’m reporting results from across all the simulated trials, including the full range of sample sizes and effect sizes.

Figure 3: Distribution of estimates of pooled SD; The range is narrower for the one-off study (pink) than for the split replication studies (blue). Purple shows area of overlap of distributions

This has been an intriguing puzzle to investigate, but in the original post, I hadn’t really been intending to do this kind of comparison - my interest was rather in making the more elementary point which is that there's a very low probability of achieving a replication when sample size and effect size are both relatively small.

Returning to that issue, another commentator said that they’d have far more confidence in five small studies all showing the same effect than in one giant study. This is exactly the view I would have taken before I looked into this with simulations; but I now realise this idea has a serious flaw, which is that you’re very unlikely to get those five replications, even if you are reasonably well powered, because – the tldr; message implicit in this post – when we’re talking about replications, we have to multiply the probabilities, and they rapidly get very low. So, if you look at the figure, suppose you have a moderate effect size, around .5, then you need a sample of 48 per group to get 80% power. But if you repeat the study five times, then the chance of getting a positive result in all five cases is .8^5, which is .33. So most of the time you’d get a mixture of null and positive results. Even if you doubled the sample size to increase power to around .95, the chance of all five studies coming out positive is still only .95^5 (82%).

Finally, another suggestion from Twitter is that a meta-analysis of several studies should give the same result as a single big sample. I’m afraid I have no expertise in meta-analysis, so I don’t know how well it handles the issue of more variable SD estimates in small samples, but I’d be interested to hear more from any readers who are up to speed with this.

Tuesday, 26 June 2018

Preprint publication as karaoke


 Doing research, analysing the results, and writing it up is a prolonged and difficult process. Submitting the paper to a journal is an anxious moment. Of course, you hope the editor and reviewers will love it and thank you for giving them the opportunity to read your compelling research. And of course, that never happens. More often you get comments from reviewers pointing out the various inadequacies of your grasp of the literature, your experimental design and your reasoning, leading to further angst as you consider how to reply. But worse than this is silence. You hear nothing. You enquire. You are told that the journal is still seeking reviewers. If you go through that loop a few times, you start to feel like the Jane Austen heroine who, having dressed up in her finery for the ball, spends the evening being ignored by all the men, while other, superficial and gaudy women are snapped up as dance partners.

There have been some downcast tweets in my timeline about papers getting stuck in this kind of journal limbo. When I suggested that it might help to post papers as preprints, several people asked how this worked, so I thought a short account might be useful.

To continue the analogy, a preprint server offers you a more modern world where you can try karaoke. You don't wait to be asked: you grab the microphone and do your thing. I now routinely post all my papers as preprints before submitting them to a journal. It gets the work out there, so even if journals are unduly slow, it can be read and you can get feedback on it.

So how does it work? Pre-prints are electronic articles that are not peer-reviewed. I hope those who know more about the history will be able to comment on this, as I'm hazy on the details, but the idea started with physicists, to whom the thought of waiting around for an editorial process to complete seemed ridiculous. Physicists have been routinely posting their work on arXiv (pronounced 'archive') for years to ensure rapid evaluation and exchange of ideas. They do still publish in journals, which creates a formal version of record, but the arXiv is what most of them read. The success of arXiv led to the development of BioRxiv, and then more recently PsyArXiv and SocArXiv. Some journals also host preprints - I have had good experiences with PeerJ, where you can deposit an article as a preprint, with the option of then updating it to a full submission to the journal if you wish*.

All of these platforms operate some basic quality control. For instance, the BioRxiv website states: 'all articles undergo a basic screening process for offensive and/or non-scientific content and for material that might pose a health or biosecurity risk and are checked for plagiarism'. However, once they have passed screening, articles are deposited immediately without further review.

Contrary to popular opinion, publishing a preprint does not usually conflict with journal policies. You can check the policy of the journal on the Sherpa/ROMEO database: most allow preprints prior to submission.

Sometimes concerns are expressed that if you post a preprint your work might be stolen by someone who'll then publish a journal article before you. In fact, it's quite the opposite. A preprint has a digital object identifier (DOI) and establishes your precedence, so guards against scooping. If you are in a fast-moving field where an evil reviewer will deliberately hold up your paper so they can get in ahead, pre-printing is the answer.

So when should you submit a preprint? I would normally recommend doing this a week or two before submitting to a journal, to allow for the possibility of incorporating feedback into the submitted manuscript, but, given that you will inevitably be asked for revisions by journal reviewers, if you post a preprint immediately before submission you will still have an opportunity to take on board other comments.

So what are the advantages of posting preprints?

1. The most obvious one is that people can access your work in a timely fashion. Preprints are freely available to all: a particularly welcome feature if you work in an area that has implications for clinical practice or policy, where practitioners may not have access to academic journals.

2. There have been cases where authors of a preprint have been invited to submit the work to a journal by an editor. This has never happened to me, but it's nice to know it's a possibility!

3. You can cite a preprint on a job application: it won't count as much as a peer-reviewed publication, but it does make it clear that the work is completed, and your evaluators can read it. This is preferable to just citing work as 'submitted'. Some funders are now also allowing preprints to be cited. https://wellcome.ac.uk/news/we-now-accept-preprints-grant-applications

4. Psychologically, for the author, it can be good to have a sense that the work is 'out there'. You have at least some control over the dissemination of your research, whereas waiting for editors and reviewers is depressing because you just feel powerless.

5. You can draw attention to a preprint on social media and explicitly request feedback. This is particularly helpful if you don't have colleagues to hand who are willing to read your paper. If you put out a request on Twitter, it doesn't mean people will necessarily reply, but you could get useful suggestions for improvement and/or make contact with others interested in your field.

On this final point, it is worth noting that there are several reasons why papers linger in journal limbo: it does not necessarily mean that the journal administration or editor is incompetent (though that can happen!). The best of editors can have a hard job finding reviewers: it's not uncommon to have to invite ten reviewers to find two who agree to review. If your papers is in a niche area then it gets even harder. For these reasons it is crucial to make your title and abstract as clear and interesting as possible: these are the only parts of the paper that potential reviewers will see, and if you are getting a lot of refusals to review, it could be that your abstract is a turn-off. So asking for feedback on a preprint may help you rewrite it in a way that encourages more interest from reviewers.

*Readers: please feel free to add other suggestions while comments are open. (I close comments once the invasion of spammers starts - typically 3-4 weeks after posting).

Saturday, 23 June 2018

Bishopblog catalogue (updated 23 June 2018)

Source: http://www.weblogcartoons.com/2008/11/23/ideas/

Those of you who follow this blog may have noticed a lack of thematic coherence. I write about whatever is exercising my mind at the time, which can range from technical aspects of statistics to the design of bathroom taps. I decided it might be helpful to introduce a bit of order into this chaotic melange, so here is a catalogue of posts by topic.

Language impairment, dyslexia and related disorders
The common childhood disorders that have been left out in the cold (1 Dec 2010) What's in a name? (18 Dec 2010) Neuroprognosis in dyslexia (22 Dec 2010) Where commercial and clinical interests collide: Auditory processing disorder (6 Mar 2011) Auditory processing disorder (30 Mar 2011) Special educational needs: will they be met by the Green paper proposals? (9 Apr 2011) Is poor parenting really to blame for children's school problems? (3 Jun 2011) Early intervention: what's not to like? (1 Sep 2011) Lies, damned lies and spin (15 Oct 2011) A message to the world (31 Oct 2011) Vitamins, genes and language (13 Nov 2011) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Phonics screening: sense and sensibility (3 Apr 2012) What Chomsky doesn't get about child language (3 Sept 2012) Data from the phonics screen (1 Oct 2012) Auditory processing disorder: schisms and skirmishes (27 Oct 2012) High-impact journals (Action video games and dyslexia: critique) (10 Mar 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) Raising awareness of language learning impairments (26 Sep 2013) Good and bad news on the phonics screen (5 Oct 2013) What is educational neuroscience? (25 Jan 2014) Parent talk and child language (17 Feb 2014) My thoughts on the dyslexia debate (20 Mar 2014) Labels for unexplained language difficulties in children (23 Aug 2014) International reading comparisons: Is England really do so poorly? (14 Sep 2014) Our early assessments of schoolchildren are misleading and damaging (4 May 2015) Opportunity cost: a new red flag for evaluating interventions (30 Aug 2015) The STEP Physical Literacy programme: have we been here before? (2 Jul 2017) Prisons, developmental language disorder, and base rates (3 Nov 2017) Reproducibility and phonics: necessary but not sufficient (27 Nov 2017) Developmental language disorder: the need for a clinically relevant definition (9 Jun 2018)

Autism
Autism diagnosis in cultural context (16 May 2011) Are our ‘gold standard’ autism diagnostic instruments fit for purpose? (30 May 2011) How common is autism? (7 Jun 2011) Autism and hypersystematising parents (21 Jun 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) The ‘autism epidemic’ and diagnostic substitution (4 Jun 2012) How wishful thinking is damaging Peta's cause (9 June 2014)

Developmental disorders/paediatrics
The hidden cost of neglected tropical diseases (25 Nov 2010) The National Children's Study: a view from across the pond (25 Jun 2011) The kids are all right in daycare (14 Sep 2011) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Changing the landscape of psychiatric research (11 May 2014)

Genetics
Where does the myth of a gene for things like intelligence come from? (9 Sep 2010) Genes for optimism, dyslexia and obesity and other mythical beasts (10 Sep 2010) The X and Y of sex differences (11 May 2011) Review of How Genes Influence Behaviour (5 Jun 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Genes, brains and lateralisation (22 Dec 2012) Genetic variation and neuroimaging (11 Jan 2013) Have we become slower and dumber? (15 May 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) Incomprehensibility of much neurogenetics research ( 1 Oct 2016) A common misunderstanding of natural selection (8 Jan 2017) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Pre-registration or replication: the need for new standards in neurogenetic studies (1 Oct 2017)

Neuroscience
Neuroprognosis in dyslexia (22 Dec 2010) Brain scans show that… (11 Jun 2011)  Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Neuronal migration in language learning impairments (2 May 2012) Sharing of MRI datasets (6 May 2012) Genetic variation and neuroimaging (1 Jan 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) What is educational neuroscience? ( 25 Jan 2014) Changing the landscape of psychiatric research (11 May 2014) Incomprehensibility of much neurogenetics research ( 1 Oct 2016)

Reproducibility
Accentuate the negative (26 Oct 2011) Novelty, interest and replicability (19 Jan 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013) Who's afraid of open data? (15 Nov 2015) Blogging as post-publication peer review (21 Mar 2013) Research fraud: More scrutiny by administrators is not the answer (17 Jun 2013) Pressures against cumulative research (9 Jan 2014) Why does so much research go unpublished? (12 Jan 2014) Replication and reputation: Whose career matters? (29 Aug 2014) Open code: note just data and publications (6 Dec 2015) Why researchers need to understand poker ( 26 Jan 2016) Reproducibility crisis in psychology ( 5 Mar 2016) Further benefit of registered reports ( 22 Mar 2016) Would paying by results improve reproducibility? ( 7 May 2016) Serendipitous findings in psychology ( 29 May 2016) Thoughts on the Statcheck project ( 3 Sep 2016) When is a replication not a replication? (16 Dec 2016) Reproducible practices are the future for early career researchers (1 May 2017) Which neuroimaging measures are useful for individual differences research? (28 May 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017) Pre-registration or replication: the need for new standards in neurogenetic studies (1 Oct 2017) Citing the research literature: the distorting lens of memory (17 Oct 2017) Reproducibility and phonics: necessary but not sufficient (27 Nov 2017) Improving reproducibility: the future is with the young (9 Feb 2018) Sowing seeds of doubt: how Gilbert et al's critique of the reproducibility project has played out (27 May 2018)  

Statistics
Book review: biography of Richard Doll (5 Jun 2010) Book review: the Invisible Gorilla (30 Jun 2010) The difference between p < .05 and a screening test (23 Jul 2010) Three ways to improve cognitive test scores without intervention (14 Aug 2010) A short nerdy post about the use of percentiles (13 Apr 2011) The joys of inventing data (5 Oct 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Causal models of developmental disorders: the perils of correlational data (24 Jun 2012) Data from the phonics screen (1 Oct 2012)Moderate drinking in pregnancy: toxic or benign? (1 Nov 2012) Flaky chocolate and the New England Journal of Medicine (13 Nov 2012) Interpreting unexpected significant results (7 June 2013) Data analysis: Ten tips I wish I'd known earlier (18 Apr 2014) Data sharing: exciting but scary (26 May 2014) Percentages, quasi-statistics and bad arguments (21 July 2014) Why I still use Excel ( 1 Sep 2016) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017) Prisons, developmental language disorder, and base rates (3 Nov 2017) How Analysis of Variance Works (20 Nov 2017) ANOVA, t-tests and regression: different ways of showing the same thing (24 Nov 2017) Using simulations to understand the importance of sample size (21 Dec 2017) Using simulations to understand p-values (26 Dec 2017)

Journalism/science communication
Orwellian prize for scientific misrepresentation (1 Jun 2010) Journalists and the 'scientific breakthrough' (13 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Orwellian prize for journalistic misrepresentation: an update (29 Jan 2011) Academic publishing: why isn't psychology like physics? (26 Feb 2011) Scientific communication: the Comment option (25 May 2011)  Publishers, psychological tests and greed (30 Dec 2011) Time for academics to withdraw free labour (7 Jan 2012) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Communicating science in the age of the internet (13 Jul 2012) How to bury your academic writing (26 Aug 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013)  A short rant about numbered journal references (5 Apr 2013) Schizophrenia and child abuse in the media (26 May 2013) Why we need pre-registration (6 Jul 2013) On the need for responsible reporting of research (10 Oct 2013) A New Year's letter to academic publishers (4 Jan 2014) Journals without editors: What is going on? (1 Feb 2015) Editors behaving badly? (24 Feb 2015) Will Elsevier say sorry? (21 Mar 2015) How long does a scientific paper need to be? (20 Apr 2015) Will traditional science journals disappear? (17 May 2015) My collapse of confidence in Frontiers journals (7 Jun 2015) Publishing replication failures (11 Jul 2015) Psychology research: hopeless case or pioneering field? (28 Aug 2015) Desperate marketing from J. Neuroscience ( 18 Feb 2016) Editorial integrity: publishers on the front line ( 11 Jun 2016) When scientific communication is a one-way street (13 Dec 2016) Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing (25 Jul 2017)

Social Media
A gentle introduction to Twitter for the apprehensive academic (14 Jun 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) Will I still be tweeting in 2013? (2 Jan 2012) Blogging in the service of science (10 Mar 2012) Blogging as post-publication peer review (21 Mar 2013) The impact of blogging on reputation ( 27 Dec 2013) WeSpeechies: A meeting point on Twitter (12 Apr 2014) Email overload ( 12 Apr 2016) How to survive on Twitter - a simple rule to reduce stress (13 May 2018)

Academic life
An exciting day in the life of a scientist (24 Jun 2010) How our current reward structures have distorted and damaged science (6 Aug 2010) The challenge for science: speech by Colin Blakemore (14 Oct 2010) When ethics regulations have unethical consequences (14 Dec 2010) A day working from home (23 Dec 2010) Should we ration research grant applications? (8 Jan 2011) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Should we ever fight lies with lies? (19 Jun 2011) How to survive in psychological research (13 Jul 2011) So you want to be a research assistant? (25 Aug 2011) NHS research ethics procedures: a modern-day Circumlocution Office (18 Dec 2011) The REF: a monster that sucks time and money from academic institutions (20 Mar 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) Journal impact factors and REF2014 (19 Jan 2013)  An alternative to REF2014 (26 Jan 2013) Postgraduate education: time for a rethink (9 Feb 2013)  Ten things that can sink a grant proposal (19 Mar 2013)Blogging as post-publication peer review (21 Mar 2013) The academic backlog (9 May 2013)  Discussion meeting vs conference: in praise of slower science (21 Jun 2013) Why we need pre-registration (6 Jul 2013) Evaluate, evaluate, evaluate (12 Sep 2013) High time to revise the PhD thesis format (9 Oct 2013) The Matthew effect and REF2014 (15 Oct 2013) The University as big business: the case of King's College London (18 June 2014) Should vice-chancellors earn more than the prime minister? (12 July 2014)  Some thoughts on use of metrics in university research assessment (12 Oct 2014) Tuition fees must be high on the agenda before the next election (22 Oct 2014) Blaming universities for our nation's woes (24 Oct 2014) Staff satisfaction is as important as student satisfaction (13 Nov 2014) Metricophobia among academics (28 Nov 2014) Why evaluating scientists by grant income is stupid (8 Dec 2014) Dividing up the pie in relation to REF2014 (18 Dec 2014)  Shaky foundations of the TEF (7 Dec 2015) A lamentable performance by Jo Johnson (12 Dec 2015) More misrepresentation in the Green Paper (17 Dec 2015) The Green Paper’s level playing field risks becoming a morass (24 Dec 2015) NSS and teaching excellence: wrong measure, wrongly analysed (4 Jan 2016) Lack of clarity of purpose in REF and TEF ( 2 Mar 2016) Who wants the TEF? ( 24 May 2016) Cost benefit analysis of the TEF ( 17 Jul 2016)  Alternative providers and alternative medicine ( 6 Aug 2016) We know what's best for you: politicians vs. experts (17 Feb 2017) Advice for early career researchers re job applications: Work 'in preparation' (5 Mar 2017) Should research funding be allocated at random? (7 Apr 2018) Power, responsibility and role models in academia (3 May 2018) My response to the EPA's 'Strengthening Transparency in Regulatory Science' (9 May 2018)  

Celebrity scientists/quackery
Three ways to improve cognitive test scores without intervention (14 Aug 2010) What does it take to become a Fellow of the RSM? (24 Jul 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) How to become a celebrity scientific expert (12 Sep 2011) The kids are all right in daycare (14 Sep 2011)  The weird world of US ethics regulation (25 Nov 2011) Pioneering treatment or quackery? How to decide (4 Dec 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Why most scientists don't take Susan Greenfield seriously (26 Sept 2014)

Women
Academic mobbing in cyberspace (30 May 2010) What works for women: some useful links (12 Jan 2011) The burqua ban: what's a liberal response (21 Apr 2011) C'mon sisters! Speak out! (28 Mar 2012) Psychology: where are all the men? (5 Nov 2012) Should Rennard be reinstated? (1 June 2014) How the media spun the Tim Hunt story (24 Jun 2015)

Politics and Religion
Lies, damned lies and spin (15 Oct 2011) A letter to Nick Clegg from an ex liberal democrat (11 Mar 2012) BBC's 'extensive coverage' of the NHS bill (9 Apr 2012) Schoolgirls' health put at risk by Catholic view on vaccination (30 Jun 2012) A letter to Boris Johnson (30 Nov 2013) How the government spins a crisis (floods) (1 Jan 2014) The alt-right guide to fielding conference questions (18 Feb 2017) We know what's best for you: politicians vs. experts (17 Feb 2017) Barely a good word for Donald Trump in Houses of Parliament (23 Feb 2017) Do you really want another referendum? Be careful what you wish for (12 Jan 2018) My response to the EPA's 'Strengthening Transparency in Regulatory Science' (9 May 2018)

Humour and miscellaneous Orwellian prize for scientific misrepresentation (1 Jun 2010) An exciting day in the life of a scientist (24 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Parasites, pangolins and peer review (26 Nov 2010) A day working from home (23 Dec 2010) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Scientific communication: the Comment option (25 May 2011) How to survive in psychological research (13 Jul 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) The bewildering bathroom challenge (19 Jul 2012) Are Starbucks hiding their profits on the planet Vulcan? (15 Nov 2012) Forget the Tower of Hanoi (11 Apr 2013) How do you communicate with a communications company? ( 30 Mar 2014) Noah: A film review from 32,000 ft (28 July 2014) The rationalist spa (11 Sep 2015) Talking about tax: weasel words ( 19 Apr 2016) Controversial statues: remove or revise? (22 Dec 2016) The alt-right guide to fielding conference questions (18 Feb 2017) My most popular posts of 2016 (2 Jan 2017)