The past couple of years have been momentous for some academic
publishers. As documented in a preprint this week, after rapid growth, largely via "special issues" of journals,
they have dramatically increased the number of published articles, and at the
same time made enormous profits. A recent guest post by Huanzi Zhang,
however, showed this has not been without problems. Unscrupulous operators of so-called
"papermills" saw an opportunity to boost their own profits by selling
authorship slots and then placing fraudulent articles in special issues that
were controlled by complicit editors. Gradually, publishers realised they had a
problem and started to retract fraudulent articles. To date, Hindawi has
retracted over 5000 articles since 2021*.
As described in Huanzi's blogpost, this has made shareholders nervous
and dented the profits of parent company Wiley.
There are numerous papermills, and we only know about the
less competent ones whose dodgy articles are relatively easy to detect. For a
deep dive into papermills in Hindawi journals see this blogpost by the
anonymous sleuth Parashorea tomentella.
At least one papermill is the source of a series of articles that follow a
template that I have termed the "AI gobbledegook sandwich". See for instance my comments here on an
article that has yet to be retracted.
For further examples, search the website PubPeer with the search term
"gobbledegook sandwich".
After studying a number of these articles, my impression is
that they are created as follows. You start with a genuine article. Most of
these look like student projects. The topics are various, but in general they
are weak on scientific content. They may be a review of an area, or if data is
gathered, it is likely to be some kind of simple survey. In some cases, reference is made to a public
dataset. To create a paper for submission, the following steps are taken:
·
The title is changed to include terms that
relate to the topic of a special issue, such as "Internet of Things"
or "Big data".
·
Phrases are scattered in the Abstract and Introduction
mentioning these terms.
·
A technical section is embedded in the middle of
the original piece describing the method to be used. Typically this is full of technical equations.
I suspect these are usually correct, in that they use standard formulae from
areas such as machine learning, and in some cases can be traced to Wikipedia or
another source. It is not uncommon to
see very basic definitions, e.g. formulae for sensitivity and specificity of
prediction.
·
A results section is created showing figures
that purport to demonstrate how the AI method has been applied to the data. This
often reveals that the paper is problematic, as plots are at best unclear and
at worst bear no relationship to anything that has gone before. Labels for figures and axes tend to be vague. A
typical claim is that the prediction from the AI model is better than results
from other, competing models. It is usually hard to work out what is being
predicted from what.
·
The original essay resumes for a Conclusions
section, but with a sentence added to say how AI methods have been useful in
improving our understanding.
·
An optional additional step is to sprinkle
irrelevant citations in the text: we know that papermills collect further
income by selling citations, and new papers can act as vehicles for these.
Papermills have got away with this, because the content of
these articles is sufficiently technical and complex that the fraud may only be
detectable on close reading. Where I am confident there is fraud, I will use
the term "Gobbledegook sandwich" in my report on PubPeer, but there
are many, many papers where my suspicions are raised but it would take more
time than it is worth for me to comb through the article to find compelling evidence.
For a papermill, the beauty of the AI gobbledegook sandwich
is that you can apply AI methods to almost any topic, and there are so many
different algorithms that can be used that there is a potentially infinite
number of papers that can be written according to this template. The ones I have documented include topics
ranging from educational methods, hotel management, sports, art, archaeology, Chinese
medicine, music, building design, mental health and promotion of Marxist
ideology. In none of these papers did the application of AI methods make any
sense, and they would not get past a competent editor or reviewers, but once a
complicit editor is planted in a journal, they can accept numerous articles.
Recently, Hindawi has ramped up its integrity operations and
is employing many more staff to try and shut this particular stable door. But Hindawi is surely not the only publisher infected
by this kind of fraud, and we need a solution that can be used by all journals.
My simple suggestion is to focus on prevention rather than cure, by requiring
that all articles that report work using AI/ML methods adopt reporting standards
that are being developed for machine-learning based science, as described on
this website. This requires computational reproducibility, i.e.,
data and scripts must be provided so that all results can be reproduced. This would be a logical impossibility for AI gobbledegook
sandwiches.
Open science practices were developed with the aim of
improving reproducibility and credibility of science, but, as I've argued elsewhere, they could be highly effective in preventing fraud. Mandating reporting standards could be an
important step, which, if accompanied also by open peer review, will make life
of the papermillers much harder.
*Source is spreadsheet maintained by the anonymous sleuth
Parashorea tomentella
N.B. Comments on this blog are moderated, so there may be a delay before they appear.