Guest post by Matt Spick
![]() |
Photo: Parsadanov/Shutterstock.com |
From relative obscurity, paper mills have recently moved into the spotlight of academic attention. These organisations - which sell manuscripts or citations to authors to enhance scholarly metrics - are growing so rapidly that many fields are being overwhelmed. The overall proportion of paper mill outputs was estimated at 1.5–2% of all scientific papers published in 2022, but a recent AI-screening cancer study estimated that around 10% of recent cancer manuscripts in some venues may be paper mill products, and in data-intensive fields paper mill outputs can now outnumber legitimate publications. The increased attention has also been driven by the integrity community highlighting unethical behaviours, notably in the FoSci Report 2026, and has resulted in initiatives such as United2Act. This is in addition to publishers and third parties setting up a growing number of integrity checking systems, whether for citation anomalies, duplicated images, or tortured phrases. But this is an adversarial process, and backward-looking checks will inevitably miss what happens next.
One option for paper mills is to stop selling unethical products, and shift to dual-use services instead. The dual-use business model has a long history. During the US prohibition era, manufacturing moonshine came with severe risks. To avoid these risks, and transfer both criminal intentions (mens rea) and criminal action (actus reus) to the customers, the California Vineyardist Association created a front organisation, Fruit Industries Ltd, to sell Vine-Glo. This innovative product consisted of a block of concentrated grape juice, easily dissolved in water to create a refreshing and entirely legal grape juice drink. The grape juice block also included an explicit warning to purchasers that “After dissolving the brick in a gallon of water, do not place the liquid in a jug away in the cupboard for twenty days, because then it would turn into wine.” And of course there are other examples of unethical actors shifting into legitimate business, whether in laundry services or waste management businesses. Such operations are hard to police, precisely because they overlap with entirely legal commercial interests.
The paper mill equivalents of Vine-Glo might include proprietary software tools to data dredge and p-hack large open datasets, such as the Global Burden of Disease Study or NHANES. Such tools could be used legitimately, but if videos were created explaining how to mass produce p-hacked papers, this would be claimed as “beyond the developers’ control”. Training courses might emerge that offer primers on scientific writing to complete beginners, starting with an icebreaker at 9:00 am on day one and concluding with submission to a PubMed indexed journal at 5:00 pm on day two. Naturally, acceptance would not be guaranteed, and the trainers would stress that this was purely a training service, not a ‘pay for authorship’ model. Proving that these practices were linked to specific examples of problematic authorship would be impossible at the individual paper level.
The lack of action (sadly, there is no such thing as the science police) is especially frustrating in an industry that is slow to respond even in clear cases of retractions being needed, due to a conservative culture around accusations of wrongdoing, which - partly for good reasons - prefers to tolerate higher levels of unethical outputs rather than inadvertently criticising or punishing scientists. At the same time, the surge in publication volumes (measured in the tens of thousands across biomedical and life sciences research) makes it impossible to conclude that no problematic behaviour is occurring.
Mills turning to dual-use products would also compromise large parts of the integrity community's current toolkit: detection based on image duplication, tortured phrases and citation anomalies would largely fail, because the outputs would represent genuine analyses of real data with real (if trivial) results. In turn, this is likely to lead to prevention having to move upstream, to pre-registration before data release and gatekeeping of access.
Will service mills offering software and training replace the existing paper mills? In practice, both models are likely to continue to exist, if only because hiring, promotion and graduation decisions are all influenced by (or completely dependent upon) publication counts. As long as the traditional model remains profitable (with first-author slots advertised at up to USD 5,600, more than sufficient to grease the wheels through editor bribery elsewhere in the publication chain) and the underlying demand continues, there will be no incentive to leave money on the table. Conventional paper mills may be able to use LLMs and agentic platforms to improve the quality of their outputs (as a different form of adversarial adaptation), making them harder to detect and preserving their value proposition for unethical authors who simply want to purchase authorship. Many potential customers are likely to find the scientific publishing process too arcane, and will be happy to continue to buy an end-to-end service. Service mills seem likely to grow in importance, however, for authors who are more risk averse. At the same time - for expert users - agentic AI platforms for science may trivialise secondary outputs such as reviews and open data analyses, providing a third route to ‘enhancing’ an author’s scholarly record.
This will present a challenge to the wider scientific community, which has only begun to wake up to the problem of paper mills as manuscript factories. In reality the mills are diversifying their offerings, creating the illusion of ‘going straight’ through dual-use products, and it seems inevitable that some expert users will be able to disintermediate the paper mills completely through agentic AI. We need to recognise that all these types of output can dilute the scholarly record, decreasing the signal to noise ratio and delaying the translation of literature into real impact, to the disadvantage of both direct users and also society as a whole.
Note: Comments are welcome, but are moderated, so there may be a delay before they appear. In general, comments are accepted if on-topic and non-anonymous.

No comments:
Post a Comment