Sunday, 27 May 2018

Sowing seeds of doubt: how Gilbert et al’s critique of the reproducibility project has played out



In Merchants of Doubt, Eric Conway and Naomi Oreskes describe how raising doubt can be used as an effective weapon against inconvenient science. On topics such as the effects of tobacco on health, climate change and causes of acid rain, it has been possible to delay or curb action to tackle problems by simply emphasising the lack of scientific consensus. This is always an option, because science is characterised by uncertainty, and indeed, we move forward by challenging one another’s findings: only a dead science would have no disagreements. But those raising concerns wield a two-edged sword: spurious and discredited criticisms can disrupt scientific progress, especially if the arguments are complex and technical: people will be left with a sense that they cannot trust the findings, even if they don’t fully understand the matters under debate.

The parallels with Merchants of Doubt occurred to me as I re-read the critique by Gilbert et al of the classic paper by the Open Science Collaboration (OSC) on ‘Estimating the reproducibility of psychological science’. I was prompted to do so because we were discussing the OSC paper in a journal club* and inevitably the question arose as to whether we needed to worry about reproducibility, in the light of the remarkable claim by Gilbert et al:  We show that OSC's article contains three major statistical errors and, when corrected, provides no evidence of a replication crisis. Indeed, the evidence is also consistent with the opposite conclusion -- that the reproducibility of psychological science is quite high and, in fact, statistically indistinguishable from 100%.’

The Gilbert et al critique has, in turn, been the subject of considerable criticism, as well as a response by a subset of the OSC group. I summarise the main points of contention in Table 1: at times they seem to be making a defeatist argument that we don’t need to worry because replication in psychology is bound to be poor: something I have disputed.

But my main focus in this post is simply to consider the impact of the critique on the reproducibility debate by looking at citations of the original article and the critique. A quick check on Web of Science found 797 citations of the OSC paper, 67 citations of Gilbert et al, and 33 citations of the response by Anderson et al.

The next thing I did, admittedly in a very informal fashion, was to download the details of the articles citing Gilbert et al and code them according to the content of what they said, as either supporting Gilbert et al’s view, rejecting the criticism, or being neutral. I discovered I needed a fourth category for papers where the citation seemed wrong or so vague as to be unclassifiable. I discarded any papers where the relevant information could not be readily accessed – I can access most journals via Oxford University but a few were behind paywalls, others were not in English, or did not appear to cite Gilbert et al. This left 44 citing papers that focused on the commentary on the OSC study. Nine of these were supportive of Gilbert et al, two noted problems with their analysis, but 33 were categorised as ‘neutral’, because the citation read something like this: 

Because of the current replicability crisis in psychological science (e.g., Open Science Collaboration, 2015; but see Gilbert, King, Pettigrew, & Wilson, 2016)….”

The strong impression was that the authors of these papers lacked either the appetite or the ability to engage with the detailed arguments in the critique, but had a sense that there was a debate and felt that they should flag this up. That’s when I started to think about Merchants of Doubt: whether intentionally or not, Gilbert et al had created an atmosphere of uncertainty to suggest there is no consensus on whether or not psychology has a reproducibility problem - people are left thinking that it's all very complicated and depends on arguments that are only of interest to statisticians. This makes it easier for those who are reluctant to take action to deal with the issue.

Fortunately, it looks as if Gilbert et al’s critique has been less successful than might have been expected, given the eminence of the authors. This may in part be because the arguments in favour of change are founded not just on demonstrations such as the OSC project, but also on logical analyses of statistical practices and publication biases that have been known about for years (see slides 15-20 of my presentation here). Furthermore, as evidenced in the footnotes to Table 1, social media allows a rapid evaluation of claims and counter-claims that hitherto was not possible when debate was restricted to and controlled by journals. The publication this week of three more big replication studies  just heaps on further empirical evidence that we have a problem that needs addressing. Those who are saying ‘nothing to see here, move along’ cannot retain any credibility.

    Table 1
Criticism
Rejoinder
‘many of OSC’s replication studies drew their samples from different populations than the original studies did’
·     ‘Many’ implies the majority. No attempt to quantify – just gives examples
·     Did not show that this feature affected replication rate
‘many of OSC’s replication studies used procedures that differed from the original study’s procedures in substantial ways.’
·     ‘Many’ implies the majority. No attempt to quantify – just gives examples
·     OSC showed that this did not affect replication rate
·     Most striking example used by Gilbert et al is given detailed explanation by Nosek (1)  
‘How many of their replication studies should we expect to have failed by chance alone? Making this estimate requires having data from multiple replications of the same original study.’
Used data from pairwise comparisons of studies from the Many Labs project to argue a low rate of agreement is to be expected.
·     Ignores publication bias impact on original studies (2, 3)
·     G et al misinterpret confidence intervals (3, 4)
·     G et al fail to take sample size/power into account, though this is crucial determinant of confidence interval (3, 4)
·      ‘Gilbert et al.’s focus on the CI measure of reproducibility neither addresses nor can account for the facts that the OSC2015 replication effect sizes were about half the size of the original studies on average, and 83% of replications elicited smaller effect sizes than the original studies.’ (2)
Results depended on whether original authors endorsed the protocol for the replication: ‘This strongly suggests that the infidelities did not just introduce random error but instead biased the replication studies toward failure.
·     Use of term ‘the infidelities’ assumes the only reason for lack of endorsement is departure from original protocol. (2)
·     Lack of endorsement included non-response from original authors (3)


References
Anderson, C. J., Bahnik, S., Barnett-Cowan, M., & et al. (2016). Response to Comment on "Estimating the reproducibility of psychological science". Science, 351(6277).
Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on "Estimating the reproducibility of psychological science". Science, 351(6277).
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Nature, 349(6251). doi:10.1126/science.aac4716


*Thanks to the enthusiastic efforts of some of our grad students, and the support of Reproducible Research Oxford, we’ve had a series of Reproducibilitea journal clubs in our department this term.  I can recommend this as a great – and relatively cheap and easy - way of raising awareness of issues around reproducibility in a department: something that is sorely needed if a recent Twitter survey by Dan Lakens is anything to go by.

Sunday, 13 May 2018

How to survive on Twitter – a simple rule to reduce stress


In recent weeks, I’ve seen tweets from a handful of people I follow saying they are thinking of giving up Twitter because it has become so negative. Of course they are entitled to do so, and they may find that it frees up time and mental space that could be better used for other things. The problem, though, is that I detect a sense of regret. And this is appropriate because Twitter, used judiciously, has great potential for good.

For me as an academic, the benefits include:
·      Finding out about latest papers and other developments relevant to my work
·      Discovering new people with interesting points of view – often these aren’t eminent or well-known and I’d never have come across them if I hadn’t been on social media
·      Being able to ask for advice from experts – sometimes getting a remarkably quick and relevant response
·      Being able to interact with non-academics who are interested in the same stuff as me
·      Getting a much better sense of the diversity of views in the broader community about topics I take for granted – this often influences how I go about public engagement
·      Having fun – there are lots of witty people who brighten my day with their tweets

The bad side, of course, is that some people say things on Twitter that they would not dream of saying to your face. They can be rude, abusive, and cruel, and sometimes mind-bogglingly impervious to reason. We now know that some of them are not even real people – they are just bots set up by those who want to sow discord among those with different political views. So how do we deal with that?

Well, I have a pretty simple rule that works for me, which is that if I find someone rude, obnoxious, irritating or tedious, I mute them. Muting differs from blocking in that the person doesn’t know they are muted. So they may continue hurling abuse or provocations at you, unaware that they are now screaming into the void.

A few years ago, when I first got into a situation where I was attacked by a group of unpleasant alt-right people (who I now realise were probably mostly bots), it didn’t feel right to ignore them, for three reasons:
·      First, they were publicly maligning me, and I felt I should defend myself.
·      Second, we’ve been told to beware the Twitter bubble. If we only interact on social media with those who are like-minded: it can create a totally false impression of what the world is like.
·      Third, walking away from an argument is not a thing a good academic does: we are trained experts in reasoned debate, and our whole instinct is to engage with those who disagree with us, examine what they say and make a counterargument.

But I soon learned that some people on social media don’t play by the rules of academic engagement. They are not sincere in their desire to discuss topics: they have a viewpoint that nothing will change, and they will use any method they can find to discredit an opponent. This includes ad hominem attacks, lying and wilful misrepresentation of what you say.  It's not cowardly to avoid these people: it's just a sensible reaction. So I now just mute anyone where I get a whiff of such behaviour – directed either towards me or anyone else.

The thing is, social media is so different from normal face-to-face interaction, that it needs different rules. Just imagine if you were sitting with friends at the pub, having a chat, and someone barged in and started shouting at you aggressively. Or someone sat down next to you, uninvited, and proceeded to drone on about a very boring topic, impervious to the impact they are having. People may have different ways of extricating themselves from these situations, but one thing you can be sure of: when you next go to the pub, you would not seek these individuals out and try to engage them in discussion.

So my rule boils down to this: Ask yourself, if I was talking to this person in the pub, would I want to prolong the interaction? Or, if there was a button that I could press to make them disappear, would I use it?  Well, on social media, there is such a button, and I recommend taking advantage of it.*


*I should make it clear that there are situations when a person is subject to such a volume of abuse that this isn’t going to be effective. Avoidance of Twitter for a while may be the only sensible option in such cases. My advice is intended for those who aren’t the centre of a vitriolic campaign, but who are turned off Twitter because of the stress it causes to observe or participate in hostile Twitter exchanges.




Wednesday, 9 May 2018

My response to the EPA's 'Strengthening Transparency in Regulatory Science'

Incredible things have happened at the US Environmental Protection Agency since Donald Trump was elected. The agency is responsible for creating standards and laws that promote the health of individuals and the environment. During previous administrations it has overseen laws concerned with controlling pollution and regulating carbon emissions. Now, under Administrator Scott Pruitt, the voice of industry and climate scepticism is in the ascendant. 

A new rule that purports to 'Strengthen Transparency in Regulatory Science' has now been proposed - ironically, at a time when the EPA is being accused of a culture of secrecy regarding its own inner workings. Anyone can comment on the rule here: I have done so, but my comment appears to be in moderation, so I am posting it here.


Dear Mr Pruitt

re: Regulatory Science- Docket ID No. EPA-HQ-OA-2018-0259

The proposed rule, ‘Strengthening transparency in regulatory science’ brings together two strands of contemporary scientific activity. On the one hand, there is a trend to make policy more evidence-based and transparent. On the other hand, there has, over the past decade, been growing awareness of problems with how science is being done, leading to research that is not always reproducible (the same results achieved by re-analysis of the data) or replicable (similar results when an experiment is repeated). The proposed rule by the Environmental Protection Agency (EPA) brings these two strands together by proposing that policy should only be based on research that has openly available public data. While this may on the surface sound like a progressive way of integrating these two strands, it rests on an over-simplified view of how science works and has considerable potential for doing harm.

I am writing in a personal capacity, as someone at the forefront of moves to improve reproducibility and replication of science in the UK. I chaired a symposium at the Academy of Medical Sciences on this topic in 2015; this was jointly organised with UK major funders: Wellcome Trust, Medical Research Council and Biotechnology and Biological Science Research Council (https://acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research). I am involved in training early career researchers in methods to improve reproducibility, and I am a co-author of Munafò, M. R et al  (2017). A manifesto for reproducible science. Nature Human Behavior, 1(1: 0021). doi:10.1038/s41562-016-0021. I would welcome any move by the US Government that would strengthen research by encouraging adoption of methods to improve science, including making analysis scripts and data open when this is not in conflict with legal/ethical issues. Unfortunately, this proposal will not do that. Instead, it will weaken science by drawing a line in the sand that effectively disregards scientific discoveries prior to the first part of the 21st century when the importance of open data started to be increasingly recognised.

The proposal ignores a key point about how scientific research works: the time scale. Most studies that would be relevant to EPA take years to do, and even longer to filter through to affect policy. Consequences for people and the planet that are relevant to environmental protection are often not immediately obvious: if they were, we would not need research. Recognition, for instance, of the dangers of asbestos, took years because the impacts on health were not immediate. Work demonstrating the connection occurred many years ago and I doubt that the data are anywhere openly available, yet the EPA’s proposed rule would imply that it could be disregarded. Similarly, I doubt there is open data demonstrating the impact of lead in paint or exhaust fumes, or of pesticides such as DDT: does this mean that manufacturers would be free to reintroduce these?

A second point is that scientific advances never depend on a single study: having open scripts and data is one way of improving our ability to check findings, but it is a relatively recent development, and it is certainly not the only way to validate science. The growth of knowledge has always depended on converging evidence from different sources, replications by different scientists and theoretical understanding of mechanisms. Scientific facts become established when the evidence is overwhelming. The EPA proposal would throw out the mass of accumulated scientific evidence from the past, when open practices were not customary – and indeed often not practical before computers for big data were available.

Contemporary scientific research is far from perfect, but the solution is not to ignore it, but to take steps to improve it and to educate policy-makers in how to identify strong science; government needs advisors who have scientific expertise and no conflict of interest, who can integrate existing evidence with policy implications. The ‘Strengthening Transparency’ proposal is short-sighted and dangerous and appears to have been developed by people with little understanding of science. It puts citizens at risk of significant damage – both to health and prosperity -  and it will make the US look scientifically illiterate to the rest of the world.


Yours sincerely


D. V. M. Bishop FMedSci, FBA, FRS


Thursday, 3 May 2018

Power, responsibility and role models in academia


Last week, Robert J. Sternberg resigned as Editor of Perspectives on Psychological Science after a series of criticisms of his behaviour on social media. I first became aware of this issue when Bobbie Spellman wrote a blogpost explaining why she was not renewing her membership of the Association for Psychological Science, noting concerns about Sternberg’s editorial bias and high rate of self-citation, among other issues.

Then a grad student at the University of Leicester, Brendan O’Connor, noted that Sternberg not only had a tendency to cite his own work; he also recycled large portions of written text in his publications. Nick Brown publicised some striking examples on his blog, and Retraction Watch subsequently published an interview with O’Connor explaining the origins of the story.

In discussing his resignation, Sternberg admitted to ‘lapses in judgement and mistakes’ but also reprimanded those who had outed him for putting their concerns online, rather than contacting him directly. A loyal colleague, James C. Kaufman, then came to his defence, tweeting:


The term ‘witch-hunt’ is routinely trotted out whenever a senior person is criticised. (Indeed, it has become one of Donald Trump’s favourite terms to describe attempts to call him out for various misbehaviours). It implies that those who are protesting at wrongdoing are self-important people who are trying to gain attention by whipping up a sense of moral panic about relatively trivial matters.

I find this both irritating and symptomatic of a deep problem in academic life. I do not regard Sternberg’s transgressions as particularly serious: He used his ready access to a publishing platform for self-promotion and self-plagiarism, was discovered, and resigned his editorial position with a rather grumbly semi-apology. If that was all there was to it, I would agree that everyone should move on.  

The problem is with the attitude of senior people such as Kaufman. A key point is missed by those who want to minimise Sternberg’s misbehaviour: He is one of the most successful psychologists in the world, and so to the next generation, he is a living embodiment of what you need to do to become a leader in the field.  So early-career scientists will look at him and conclude that to get to the top you need to bend the rules.

In terms of abuse of editorial power, Sternberg’s behaviour is relatively tame. Consider the case of Johnny Matson, Jeff Sigafoos, Giuliano Lancioni and Mark O’Reilly, who formed a coterie of editors and editorial board members who enhanced their publications and citations by ditching usual practices such as peer review when handling one another’s papers. I documented the evidence for this back in 2015, and there appear to have been no consequences for any of these individuals. You might think it isn’t so important if a load of dodgy papers make it into a few journals, but in this case, there was potential for damage beyond academia: the subject matter concerned developmental disorders, and methods of assessment and intervention were given unjustified credibility by being published in journals that were thought to be peer-reviewed. In addition, the corrosive influence on the next generation of psychologists was all too evident: When I first wrote about this, I was contacted by several early-career people who had worked with the dodgy editors: they confirmed that they were encouraged to adopt similar practices if they wanted to get ahead.

When we turn to abuse of personal power, there have been instances in academia that are much, much worse than editorial misdemeanours – clearly documented cases of senior academics acting as sexual predators on junior staff – see, for instance, here and here. With the #MeToo campaign (another ‘witch-hunt’), things are starting to change, but the recurring theme is that if you are sufficiently powerful you can get away with almost anything.

Institutions that hire top academics seem desperate to cling on to them because they bring in grants and fame.  Of course, accusations need to be fully investigated in a fair and impartial fashion, but in matters such as editorial transgressions, the evidence is there for all to see, and a prompt response is required.

The problem with the academic hierarchy is that at the top there is a great deal of power and precious little responsibility. Those who make it to positions of authority should uphold high professional standards and act as academic role models. At a time when many early-career researchers are complaining that their PIs are encouraging them to adopt bad scientific practices, it’s all the more important that we don’t send the message that you need to act selfishly and cut corners in order to succeed.

I don’t want to see Sternberg vilified, but I do think the onus is now on the academic establishment to follow Bobbie Spellman’s lead and state publicly that his behaviour fell below what we would expect from an academic role model – rather than sweeping it under the carpet or, even worse, portraying him as a victim.