Tuesday 9 August 2022

Can systematic reviews help clean up science?


The systematic review was not turning out as Lorna had expected

Why do people take the risk of publishing fraudulent papers, when it is easy to detect the fraud? One answer is that they don’t expect to be caught. A consequence of the growth in systematic reviews is that this assumption may no longer be safe. 

In June I participated in a symposium organised by the LMU Open Science Center in Munich entitled “How paper mills publish fake science industrial-style – is there really a problem and how does it work?” The presentations are available here. I focused on the weird phenomenon of papers containing “tortured phrases”, briefly reviewed here. For a fuller account see here. These are fakes that are easy to detect, because, in the course of trying to circumvent plagiarism detection software, they change words, with often unintentionally hilarious consequences. For instance, “breast cancer” becomes “bosom peril” and “random value” becomes “irregular esteem”. Most of these papers make no sense at all – they may include recycled figures from other papers. They are typically highly technical and so to someone without expertise in the area they may seem valid, but anyone familiar with the area will realise that someone who writes “flag to commotion” instead of “signal to noise” is a hoaxer. 

Speakers at the symposium drew attention to other kinds of paper mill whose output is less conspicuously weird. Jennifer Byrne documented industrial-scale research fraud in papers on single gene analyses that were created by templates, and which purported to provide data on under-studied genes in human cancer models. Even an expert in the field may be hoodwinked by these. I addressed the question of “does it matter?” For the nonsense papers generated using tortured phrases, it could be argued that it doesn’t, because nobody will try to build on that research. But there are still victims: authors of these fraudulent papers may outcompete other, honest scientists for jobs and promotion, journals and publishers will suffer reputational damage, and public trust in science is harmed. But what intrigued me was that the authors of these papers may also be regarded as victims, because they will have on public record a paper that is evidently fraudulent. It seems that either they are unaware of just how crazy the paper appears, or that they assume nobody will read it anyway. 

The latter assumption may have been true a couple of decades ago, but with the growth of systematic reviews, researchers are scrutinizing many papers that previously would have been ignored. I was chatting with John Loadsman, who in his role as editor of Anaesthesia and Intensive Care has uncovered numerous cases of fraud. He observed that many paper mill outputs never get read because, just on the basis of the title or abstract, they appear trivial or uninteresting. However, when you do a systematic review, you are supposed to read everything relevant to the research question, and evaluate it, so these odd papers may come to light. 

I’ve previously blogged about the importance of systematic reviews for avoiding cherrypicking of the literature. Of course, evaluation of papers is often done poorly or not at all, in which case the fraudulent papers just pollute the literature when added to a meta-analysis. But I’m intrigued at the idea that systematic reviews might also serve the purpose of putting the spotlight on dodgy science in general, and fraudsters in particular, by forcing us to read things thoroughly. I therefore asked Twitter for examples – I asked specifically about meta-analysis but the responses covered systematic reviews more broadly, and were wide-ranging both in the types of issue that were uncovered and the subject areas. 

Twitter did not disappoint: I received numerous examples – more than I can include here. Much of what was described did not sound like the work of paper mills, but did include fraudulent data manipulation, plagiarism, duplication of data in different papers, and analytic errors. Here are some examples: 

Paper mills and template papers

Jennifer Byrne noted how she became aware of paper mills when looking for studies of a particular gene she was interested in, which was generally under-researched. Two things raised her suspicions: a sudden spike in studies of the gene, plus series of papers that had the same structure, as if constructed from a template. Subsequently, with Cyril LabbĂ©, who developed an automated Seek & Blastn tool to assess nucleotide sequences, she found numerous errors in the reagents and specification of genetic sequences of these repetitive papers, and it became clear that they were fraudulent. 

An example of a systematic review that discovered a startling level of inadequate and possibly fraudulent research was focused on the effect of tranexamic acid on post-partum haemorrhage: out of 26 reports, eight had sections of identical or very similar text, despite apparently coming from different trials. This is similar to what has been described for papers from paper mills, which are constructed from a template. And, as might be expected for a paper mill output, there were also numerous statistical and methodological errors, and some cases without ethical approval. (Thanks to @jd_wilko for pointing me to this example). 


Back in 2006, Iain Chalmers, who is generally ahead of his time, noted that systematic reviews could root out cases of plagiarism, citing the example of Asim Kurjak, whose paper on epidural analgesia in labour was heavily plagiarised

Data duplication 

Meta-analysis can throw up cases where the same study is reported in two or more papers, with no indication that this is the same data. Although this might seem like a minor problem compared with fraud, it can be serious, because if the duplication is missed in a meta-analysis, that study will be given more weight than it should have. Ioana Cristea noted that such ‘zombie papers’ have cropped up in a meta-analysis she is currently analysing. 

Tampering with peer review 

When a paper considered for a meta-analysis seems dubious, it raises the question of whether proper peer review procedures were followed. It helps if the journal adopts open peer review. Robin N. Kok reported a paper where the same person was listed as an author and a peer reviewer. This was eventually retracted.  

Data seem too good to be true 

This piece in Science tells the story of Qian Zhang, who published a series of studies on impact of cartoon violence in children which on the one hand had remarkably large samples of children all at the same age, and on the other hand had similar samples across apparently different studies.  Because of their enormous size, Zhang’s papers distorted any meta-analysis they were included in. 

Aaron Charlton cited another case, where serious anomalies were picked up in a study on marketing in the course of a meta-analysis. The paper was ultimately retracted 3 years after the concerns were raised, after defensive responses from some of the authors, challenging the meta-analysts. 

This case flagged by Neil O’Connell is especially useful, as it documents a range of methods used to evaluate suspect research. The dodgy work was first flagged up in a meta-analysis of cognitive behaviour therapy for chronic pain.  Three papers with the same lead author, M. Monticone, obtained results that were discrepant with the rest of the literature, with much bigger effect sizes. The meta-analysts then looked at other trials by the same team and found that there was a 6-fold difference between the lower confidence interval of the Monticone studies and the upper confidence interval of all others combined. The paper also reports email exchanges with Dr Monticone that may be of interest to readers. 

Poor methodology 

Fiona Ramage told me that in the course of doing a preclinical systematic review and meta-analysis of nutritional neuroscience, she encountered numerous errors of basic methodology and statistics, e.g. dozens of papers where error bars were presented without indicating if they show SE or SD; studies claiming differences between groups without a direct statistical comparison. This is more likely to be due to ignorance or honest error than to malpractice, but it needs to be flagged up so that the literature is not polluted by erroneous data.

What are the consequences?

Of course, the potential of systematic reviews to detect bad science is only realised if the dodgy papers are indeed weeded out of the literature, and people who commit scientific fraud are fired. Journals and publishers have started to respond to paper mills, but, as Ivan Oransky has commented, this is a game of Whac-a-Mole, and "the process of retracting a paper remains comically clumsy, slow and opaque”. 

I was surprised that even when confronted with an obvious case of a paper that had both numerous tortured phrases and plagiarism, the response from the publisher was slow – e.g. this comically worded example is still not retracted, even though the publisher’s research integrity office acknowledged my email expressing concern over 2 months ago.  But 2 months is nothing. Guillaume Cabanac recently tweeted about a "barn door" case of plagiarism that has just been retracted 20 years after it was first flagged up.  When I discuss the slow responses to concerns with publishers, they invariably say that they are being kept very busy with a huge volume of material from paper mills. To which I answer, you are making immense profits, so perhaps some could be channeled into employing more people to tackle this problem. As I am fond of pointing out, I regard a publisher who leaves seriously problematic studies in the literature as analogous to a restauranteur that serves poisoned food to customers. 

Publishers may be responsible for correcting the scientific record, but it is institutional employers who need to deal with those who commit malpractice. Many institutions don’t seem to take fraud seriously. This point was made back in 2006 by Iain Chalmers, who described the lenient treatment of Asim Kurjak, and argued for public naming and shaming of those who are found guilty of scientific misconduct. Unfortunately, there’s not much evidence that his advice has been heeded. Consider this recent example of a director of a primate reseach lab who admitted fraud, but is still in post. (Here the fraud was highlighted by a whistleblower rather than a systematic review, but this illustrates the difficulty of tackling fraud when there are only minor consequences for fraudsters). 

Could a move towards "slow science" help? In the humanities, literary scholars pride themselves on “close reading” of texts. In science, we are often so focused on speed and concision, that we tend to lose the ability to focus deeply on a text, especially if it is boring. The practice of doing a systematic review should in principle develop better skills in evaluation of individual papers, and in so doing help cleanse the literature from papers that should never have got published in the first place. John Loadsman has suggested we should not only read papers carefully, but should recalibrate ourselves to have a very high “index of suspicion” rather than embracing the default assumption that everyone is honest. 


Many thanks to everyone who sent in examples. Sorry I could not include everything. Please feel free to add other examples or reactions in the Comments – these tend to get overwhelmed with adverts for penis enlargement or (ironically) essay mills, and so are moderated, but I do check them and relevant comments will eventually appear.

PPS. Florian Naudet sent a couple of relevant links that readers might enjoy: 

Fascinating article by Fanelli et al who looked at how inclusion of retracted papers affected meta-analyses: https://www.tandfonline.com/doi/full/10.1080/08989621.2021.1947810  

And this piece by Lawrence et al shows the dangers of meta-analyses when there is insufficient scrutiny of the papers that are included: https://www.nature.com/articles/s41591-021-01535-y  

Also, Joseph Lee tweeted about this paper about inclusion of papers from predatory publications in meta-analyses: https://jmla.pitt.edu/ojs/jmla/article/view/491 

PPPS. 11th August 2022

A couple of days after posting this, I received a copy of "Systematic Reviews in Health Research" edited by Egger, Higgins and Davey Smith. Needless to say, the first thing I did was to look up "fraud" in the index. Although there are only a couple of pages on this, the examples are striking. 

First, a study by Nowbar et al (2014) on bone marrow stem cells for heart disease found that in a review of 133 reports, over 600 discrepancies were found, and the number of discrepancies increased with the reported effect size. There's a trail of comments on Pubpeer relating to some of the sources, e.g. https://pubpeer.com/publications/B346354468C121A468D30FDA0E295E.

Another example concerns the use of beta-blockers during surgery. A series of studies from one centre (the DECREASE trials) showing good evidence of effectiveness was investigated and found to be inadequate, with missing data and failure to follow research protocols. When these studies were omitted from a meta-analysis, the conclusion was that, far from receiving benefit from beta-blockers, patients in the treatment group were more likely to die (Bouri et al, 2014). 

 PPPPS, 18th August 2022

This comment by Jennifer Byrne was blocked by Blogger - possibly because it contained weblinks.

Anyhow, here is what she said:

I agree, reading both widely and deeply can help to identify problematic papers, and an ideal time for this to happen is when authors are writing either narrative or systematic reviews. Here's another two examples where Prof Carlo Galli and colleagues identified similar papers that may have been based on templates: https://www.mdpi.com/2304-6775/7/4/67, https://link.springer.com/article/10.1007/s11192-022-04434-2 


Wednesday 3 August 2022

Contagion of the political system


Citizens of the UK have in recent weeks watched in amazement as the current candidates for leadership of the Conservative party debate their policies. Whoever wins will replace Boris Johnson as Prime Minister, with the decision made by a few thousand members of the Conservative Party. All options were bad, and we are now down to the last two: Liz Truss and Rishi Sunak.


For those of us who are not Conservatives, and for many who are, there was immense joy at the ousting of Boris Johnson. The man seemed like a limpet, impossible to dislodge. Every week brought a new scandal that would have been more than sufficient to lead to resignation 10 years ago, yet he hung on and on. Many people thought that, after a vote of no confidence in his leadership, he would step down so that a caretaker PM could run the country while the debate over his successor took place, but the limpet is still clinging on. He’s not doing much running of the country, but that’s normal, and perhaps for the best. He’s much better at running parties than leading the Conservative party.


I have to say I had not expected much from Truss and Sunak, but even my low expectations have not been met. The country is facing immense challenges, from climate change, from coronavirus, and from escalating energy prices. These are barely mentioned: instead the focus is on reducing taxes, with the candidates now competing for just how much tax they can cut. As far as I can see, these policies will do nothing to help the poorest in society, whose benefits will shrink to pay for tax cuts; the richer you are the more tax you pay and so this is a rich person’s policy.


What has surprised me is just how ill-informed the two candidates are. The strategy seems to be to pick a niche topic of interest to Conservative voters, make up a new policy overnight and announce it the next day. So we have Rishi Sunak proposing that the solution to the crisis in the NHS is to charge people who miss doctor’s appointments. Has he thought this through? Think of the paperwork. Think of the debt collectors tasked with collecting £10 from a person with dementia. Think of the cost of all of this.  And on Education, his idea is to reintroduce selective (grammar) schools: presumably because he thinks that our regular schools are inadequate to educate intelligent children.


On Education, Liz Truss is even worse. Her idea is that all children who score top marks in their final year school examinations should get an interview to go to Oxford or Cambridge University. This is such a crazy idea that others have written at length to point out its flaws (e.g. this cogent analysis by Sam Freedman). Suffice it to say that it has a similar vibe to the Sunak grammar schools plan: it implies that only two British universities have any value. Conservatives do seem obsessed with creating divisions between haves and have-nots, but only if they can ensure their children are among the haves.


Another confused statement from Truss is that, as far as Scotland goes, she plans to ignore Nicola Sturgeon, the First Minister of Scotland and leader of the Scottish National Party. At a time when relationships between Scotland and England are particularly fraught, this insensitive statement is reminiscent of the gaffes of Boris Johnson.


Oh, and yesterday she also announced – and then quickly U-turned – an idea that would limit the pay of public sector workers in the North of England, because it was cheaper to live there.


What I find so odd about both Sunak and Truss is that they keep scoring own goals. Nobody requires them to keep coming up with new policies in niche areas.  Why don’t they just hold on to their original positions, and if asked about anything else, just agree to ‘look at’ it when in power? Johnson was always very good at promising to ‘look at’ things: when he’s not being a limpet, he’s a basilisk. The more you probe Sunak and Truss, the more their shallowness and lack of expertise show through. They’d do well to keep schtum. Or, better still, show some indication that they could, for instance, get a grip on the crisis in the NHS.


What all this demonstrates is how an incompetent and self-promoting leader causes damage far beyond their own term. Johnson’s cabinet was selected purely on one criterion: loyalty to him. The first requirement was to “believe in Brexit” – reminiscent of the historical wars between Protestants and Catholics, where the first thing you ask of a candidate is what their religion is. Among Conservative politicians, it seems that an accusation of not really being a Brexiteer is the worst thing you can say about a candidate. Indeed, that is exactly the charge that her opponents level against Truss, who made cogent arguments for remaining in the EU before the referendum. Like a Protestant required to recant their beliefs or face the flames, she is now reduced to defending Brexit in the strongest possible terms, saying that “predictions of doom have not come true”, as farmers, fishermen, and exporters go out of business, academics leave in droves, and holidaymakers sit in queues at Dover.


It's known that Johnson does not want to give up the top job. I’m starting to wonder if behind all of this is a cunning plan. The people he’s appointed to cabinet are so incompetent that maybe he hopes that, when confronted with a choice between them, the Conservative Party will decide that he looks better than either of them.