The past couple of years have been momentous for some academic
publishers. As documented in a preprint this week, after rapid growth, largely via "special issues" of journals,
they have dramatically increased the number of published articles, and at the
same time made enormous profits. A recent guest post by Huanzi Zhang,
however, showed this has not been without problems. Unscrupulous operators of so-called
"papermills" saw an opportunity to boost their own profits by selling
authorship slots and then placing fraudulent articles in special issues that
were controlled by complicit editors. Gradually, publishers realised they had a
problem and started to retract fraudulent articles. To date, Hindawi has
retracted over 5000 articles since 2021*.
As described in Huanzi's blogpost, this has made shareholders nervous
and dented the profits of parent company Wiley.
There are numerous papermills, and we only know about the less competent ones whose dodgy articles are relatively easy to detect. For a deep dive into papermills in Hindawi journals see this blogpost by the anonymous sleuth Parashorea tomentella. At least one papermill is the source of a series of articles that follow a template that I have termed the "AI gobbledegook sandwich". See for instance my comments here on an article that has yet to be retracted. For further examples, search the website PubPeer with the search term "gobbledegook sandwich".
After studying a number of these articles, my impression is that they are created as follows. You start with a genuine article. Most of these look like student projects. The topics are various, but in general they are weak on scientific content. They may be a review of an area, or if data is gathered, it is likely to be some kind of simple survey. In some cases, reference is made to a public dataset. To create a paper for submission, the following steps are taken:
· The title is changed to include terms that relate to the topic of a special issue, such as "Internet of Things" or "Big data".
· Phrases are scattered in the Abstract and Introduction mentioning these terms.
· A technical section is embedded in the middle of the original piece describing the method to be used. Typically this is full of technical equations. I suspect these are usually correct, in that they use standard formulae from areas such as machine learning, and in some cases can be traced to Wikipedia or another source. It is not uncommon to see very basic definitions, e.g. formulae for sensitivity and specificity of prediction.
· A results section is created showing figures that purport to demonstrate how the AI method has been applied to the data. This often reveals that the paper is problematic, as plots are at best unclear and at worst bear no relationship to anything that has gone before. Labels for figures and axes tend to be vague. A typical claim is that the prediction from the AI model is better than results from other, competing models. It is usually hard to work out what is being predicted from what.
· The original essay resumes for a Conclusions section, but with a sentence added to say how AI methods have been useful in improving our understanding.
· An optional additional step is to sprinkle irrelevant citations in the text: we know that papermills collect further income by selling citations, and new papers can act as vehicles for these.
Papermills have got away with this, because the content of these articles is sufficiently technical and complex that the fraud may only be detectable on close reading. Where I am confident there is fraud, I will use the term "Gobbledegook sandwich" in my report on PubPeer, but there are many, many papers where my suspicions are raised but it would take more time than it is worth for me to comb through the article to find compelling evidence.
For a papermill, the beauty of the AI gobbledegook sandwich is that you can apply AI methods to almost any topic, and there are so many different algorithms that can be used that there is a potentially infinite number of papers that can be written according to this template. The ones I have documented include topics ranging from educational methods, hotel management, sports, art, archaeology, Chinese medicine, music, building design, mental health and promotion of Marxist ideology. In none of these papers did the application of AI methods make any sense, and they would not get past a competent editor or reviewers, but once a complicit editor is planted in a journal, they can accept numerous articles.
Recently, Hindawi has ramped up its integrity operations and is employing many more staff to try and shut this particular stable door. But Hindawi is surely not the only publisher infected by this kind of fraud, and we need a solution that can be used by all journals. My simple suggestion is to focus on prevention rather than cure, by requiring that all articles that report work using AI/ML methods adopt reporting standards that are being developed for machine-learning based science, as described on this website. This requires computational reproducibility, i.e., data and scripts must be provided so that all results can be reproduced. This would be a logical impossibility for AI gobbledegook sandwiches.
Open science practices were developed with the aim of improving reproducibility and credibility of science, but, as I've argued elsewhere, they could be highly effective in preventing fraud. Mandating reporting standards could be an important step, which, if accompanied also by open peer review, will make life of the papermillers much harder.
*Source is spreadsheet maintained by the anonymous sleuth Parashorea tomentella
N.B. Comments on this blog are moderated, so there may be a delay before they appear.