Tuesday 9 August 2022

Can systematic reviews help clean up science?


The systematic review was not turning out as Lorna had expected

Why do people take the risk of publishing fraudulent papers, when it is easy to detect the fraud? One answer is that they don’t expect to be caught. A consequence of the growth in systematic reviews is that this assumption may no longer be safe. 

In June I participated in a symposium organised by the LMU Open Science Center in Munich entitled “How paper mills publish fake science industrial-style – is there really a problem and how does it work?” The presentations are available here. I focused on the weird phenomenon of papers containing “tortured phrases”, briefly reviewed here. For a fuller account see here. These are fakes that are easy to detect, because, in the course of trying to circumvent plagiarism detection software, they change words, with often unintentionally hilarious consequences. For instance, “breast cancer” becomes “bosom peril” and “random value” becomes “irregular esteem”. Most of these papers make no sense at all – they may include recycled figures from other papers. They are typically highly technical and so to someone without expertise in the area they may seem valid, but anyone familiar with the area will realise that someone who writes “flag to commotion” instead of “signal to noise” is a hoaxer. 

Speakers at the symposium drew attention to other kinds of paper mill whose output is less conspicuously weird. Jennifer Byrne documented industrial-scale research fraud in papers on single gene analyses that were created by templates, and which purported to provide data on under-studied genes in human cancer models. Even an expert in the field may be hoodwinked by these. I addressed the question of “does it matter?” For the nonsense papers generated using tortured phrases, it could be argued that it doesn’t, because nobody will try to build on that research. But there are still victims: authors of these fraudulent papers may outcompete other, honest scientists for jobs and promotion, journals and publishers will suffer reputational damage, and public trust in science is harmed. But what intrigued me was that the authors of these papers may also be regarded as victims, because they will have on public record a paper that is evidently fraudulent. It seems that either they are unaware of just how crazy the paper appears, or that they assume nobody will read it anyway. 

The latter assumption may have been true a couple of decades ago, but with the growth of systematic reviews, researchers are scrutinizing many papers that previously would have been ignored. I was chatting with John Loadsman, who in his role as editor of Anaesthesia and Intensive Care has uncovered numerous cases of fraud. He observed that many paper mill outputs never get read because, just on the basis of the title or abstract, they appear trivial or uninteresting. However, when you do a systematic review, you are supposed to read everything relevant to the research question, and evaluate it, so these odd papers may come to light. 

I’ve previously blogged about the importance of systematic reviews for avoiding cherrypicking of the literature. Of course, evaluation of papers is often done poorly or not at all, in which case the fraudulent papers just pollute the literature when added to a meta-analysis. But I’m intrigued at the idea that systematic reviews might also serve the purpose of putting the spotlight on dodgy science in general, and fraudsters in particular, by forcing us to read things thoroughly. I therefore asked Twitter for examples – I asked specifically about meta-analysis but the responses covered systematic reviews more broadly, and were wide-ranging both in the types of issue that were uncovered and the subject areas. 

Twitter did not disappoint: I received numerous examples – more than I can include here. Much of what was described did not sound like the work of paper mills, but did include fraudulent data manipulation, plagiarism, duplication of data in different papers, and analytic errors. Here are some examples: 

Paper mills and template papers

Jennifer Byrne noted how she became aware of paper mills when looking for studies of a particular gene she was interested in, which was generally under-researched. Two things raised her suspicions: a sudden spike in studies of the gene, plus series of papers that had the same structure, as if constructed from a template. Subsequently, with Cyril Labbé, who developed an automated Seek & Blastn tool to assess nucleotide sequences, she found numerous errors in the reagents and specification of genetic sequences of these repetitive papers, and it became clear that they were fraudulent. 

An example of a systematic review that discovered a startling level of inadequate and possibly fraudulent research was focused on the effect of tranexamic acid on post-partum haemorrhage: out of 26 reports, eight had sections of identical or very similar text, despite apparently coming from different trials. This is similar to what has been described for papers from paper mills, which are constructed from a template. And, as might be expected for a paper mill output, there were also numerous statistical and methodological errors, and some cases without ethical approval. (Thanks to @jd_wilko for pointing me to this example). 


Back in 2006, Iain Chalmers, who is generally ahead of his time, noted that systematic reviews could root out cases of plagiarism, citing the example of Asim Kurjak, whose paper on epidural analgesia in labour was heavily plagiarised

Data duplication 

Meta-analysis can throw up cases where the same study is reported in two or more papers, with no indication that this is the same data. Although this might seem like a minor problem compared with fraud, it can be serious, because if the duplication is missed in a meta-analysis, that study will be given more weight than it should have. Ioana Cristea noted that such ‘zombie papers’ have cropped up in a meta-analysis she is currently analysing. 

Tampering with peer review 

When a paper considered for a meta-analysis seems dubious, it raises the question of whether proper peer review procedures were followed. It helps if the journal adopts open peer review. Robin N. Kok reported a paper where the same person was listed as an author and a peer reviewer. This was eventually retracted.  

Data seem too good to be true 

This piece in Science tells the story of Qian Zhang, who published a series of studies on impact of cartoon violence in children which on the one hand had remarkably large samples of children all at the same age, and on the other hand had similar samples across apparently different studies.  Because of their enormous size, Zhang’s papers distorted any meta-analysis they were included in. 

Aaron Charlton cited another case, where serious anomalies were picked up in a study on marketing in the course of a meta-analysis. The paper was ultimately retracted 3 years after the concerns were raised, after defensive responses from some of the authors, challenging the meta-analysts. 

This case flagged by Neil O’Connell is especially useful, as it documents a range of methods used to evaluate suspect research. The dodgy work was first flagged up in a meta-analysis of cognitive behaviour therapy for chronic pain.  Three papers with the same lead author, M. Monticone, obtained results that were discrepant with the rest of the literature, with much bigger effect sizes. The meta-analysts then looked at other trials by the same team and found that there was a 6-fold difference between the lower confidence interval of the Monticone studies and the upper confidence interval of all others combined. The paper also reports email exchanges with Dr Monticone that may be of interest to readers. 

Poor methodology 

Fiona Ramage told me that in the course of doing a preclinical systematic review and meta-analysis of nutritional neuroscience, she encountered numerous errors of basic methodology and statistics, e.g. dozens of papers where error bars were presented without indicating if they show SE or SD; studies claiming differences between groups without a direct statistical comparison. This is more likely to be due to ignorance or honest error than to malpractice, but it needs to be flagged up so that the literature is not polluted by erroneous data.

What are the consequences?

Of course, the potential of systematic reviews to detect bad science is only realised if the dodgy papers are indeed weeded out of the literature, and people who commit scientific fraud are fired. Journals and publishers have started to respond to paper mills, but, as Ivan Oransky has commented, this is a game of Whac-a-Mole, and "the process of retracting a paper remains comically clumsy, slow and opaque”. 

I was surprised that even when confronted with an obvious case of a paper that had both numerous tortured phrases and plagiarism, the response from the publisher was slow – e.g. this comically worded example is still not retracted, even though the publisher’s research integrity office acknowledged my email expressing concern over 2 months ago.  But 2 months is nothing. Guillaume Cabanac recently tweeted about a "barn door" case of plagiarism that has just been retracted 20 years after it was first flagged up.  When I discuss the slow responses to concerns with publishers, they invariably say that they are being kept very busy with a huge volume of material from paper mills. To which I answer, you are making immense profits, so perhaps some could be channeled into employing more people to tackle this problem. As I am fond of pointing out, I regard a publisher who leaves seriously problematic studies in the literature as analogous to a restauranteur that serves poisoned food to customers. 

Publishers may be responsible for correcting the scientific record, but it is institutional employers who need to deal with those who commit malpractice. Many institutions don’t seem to take fraud seriously. This point was made back in 2006 by Iain Chalmers, who described the lenient treatment of Asim Kurjak, and argued for public naming and shaming of those who are found guilty of scientific misconduct. Unfortunately, there’s not much evidence that his advice has been heeded. Consider this recent example of a director of a primate reseach lab who admitted fraud, but is still in post. (Here the fraud was highlighted by a whistleblower rather than a systematic review, but this illustrates the difficulty of tackling fraud when there are only minor consequences for fraudsters). 

Could a move towards "slow science" help? In the humanities, literary scholars pride themselves on “close reading” of texts. In science, we are often so focused on speed and concision, that we tend to lose the ability to focus deeply on a text, especially if it is boring. The practice of doing a systematic review should in principle develop better skills in evaluation of individual papers, and in so doing help cleanse the literature from papers that should never have got published in the first place. John Loadsman has suggested we should not only read papers carefully, but should recalibrate ourselves to have a very high “index of suspicion” rather than embracing the default assumption that everyone is honest. 


Many thanks to everyone who sent in examples. Sorry I could not include everything. Please feel free to add other examples or reactions in the Comments – these tend to get overwhelmed with adverts for penis enlargement or (ironically) essay mills, and so are moderated, but I do check them and relevant comments will eventually appear.

PPS. Florian Naudet sent a couple of relevant links that readers might enjoy: 

Fascinating article by Fanelli et al who looked at how inclusion of retracted papers affected meta-analyses: https://www.tandfonline.com/doi/full/10.1080/08989621.2021.1947810  

And this piece by Lawrence et al shows the dangers of meta-analyses when there is insufficient scrutiny of the papers that are included: https://www.nature.com/articles/s41591-021-01535-y  

Also, Joseph Lee tweeted about this paper about inclusion of papers from predatory publications in meta-analyses: https://jmla.pitt.edu/ojs/jmla/article/view/491 

PPPS. 11th August 2022

A couple of days after posting this, I received a copy of "Systematic Reviews in Health Research" edited by Egger, Higgins and Davey Smith. Needless to say, the first thing I did was to look up "fraud" in the index. Although there are only a couple of pages on this, the examples are striking. 

First, a study by Nowbar et al (2014) on bone marrow stem cells for heart disease found that in a review of 133 reports, over 600 discrepancies were found, and the number of discrepancies increased with the reported effect size. There's a trail of comments on Pubpeer relating to some of the sources, e.g. https://pubpeer.com/publications/B346354468C121A468D30FDA0E295E.

Another example concerns the use of beta-blockers during surgery. A series of studies from one centre (the DECREASE trials) showing good evidence of effectiveness was investigated and found to be inadequate, with missing data and failure to follow research protocols. When these studies were omitted from a meta-analysis, the conclusion was that, far from receiving benefit from beta-blockers, patients in the treatment group were more likely to die (Bouri et al, 2014). 

 PPPPS, 18th August 2022

This comment by Jennifer Byrne was blocked by Blogger - possibly because it contained weblinks.

Anyhow, here is what she said:

I agree, reading both widely and deeply can help to identify problematic papers, and an ideal time for this to happen is when authors are writing either narrative or systematic reviews. Here's another two examples where Prof Carlo Galli and colleagues identified similar papers that may have been based on templates: https://www.mdpi.com/2304-6775/7/4/67, https://link.springer.com/article/10.1007/s11192-022-04434-2 



  1. Great piece. Might help if systematic reviews were accorded the academic credit they deserve in research assessments, might help if there were journals promoting systematic review and which helped to improve the standards of systematic review - at present, it seems to me that many reviews that describe themselves as systematic are nothing of the sort. Maybe, in the interests of research integrity, research councils should fund such a journal, rather than leave it to commercial publishers

  2. Great post. I and others have found major issues with data duplication when analyzing the impact of single-sex education. Details for those interested: https://scholarsphere.psu.edu/resources/baed14e3-22af-42fc-aa83-114e41fdc89b

  3. Thank you ever so much for this. Been involved for some while in finding misconduct and error in studies that are actually included in systematic reviews. There is so much around, and so many examples, that keeping in touch with it all is difficult. Your blog is a great help.