Friday, 21 November 2025

The dangers of using bibliometrics with polluted data

 

This week I attended a Webinar organised by Clarivate on the topic of “Eugene Garfield Centenary Celebration: Past, Present and Future of Scientometrics”. This covered the early history of the late Eugene Garfield’s work as well as current developments and future trends. The historical sessions were fascinating, describing the remarkable innovations that Garfield made in his quest to understand the body of scientific information as a network. He realised that similarities between articles could be identified by shared citations, and, in the 1950s he devised systems for capturing this information using punched cards. I am old enough to remember going to the library in the 1970s to consult Science Citation Index, which could not only point me to important articles in my field, but often led me in strange directions as I stumbled upon other fascinating topics. 

Garfield is known as the father of the Journal Impact Factor, which is regarded by many as an abomination that distorts publishing behaviour because of its connotations of prestige. It was, however, originally conceived as an index that would help librarians decide which journals to purchase, and only later repurposed into a metric that people used as a proxy for the status of the researchers publishing in those journals. 

I enjoyed hearing about Garfield, who sounds like a delightful and humane polymath, who recognised the value of the information in indexes and found ingenious ways to synthesise it. I can recommend an archive of his work maintained by the University of Pennsylvania.

The later speakers in the webinar focused on novel developments in the use of scientometrics to evaluate research quality. Giovanni Abramo noted how Italian science had been affected by favouritism, because of an exclusive reliance on subjective peer review to evaluate researchers and their institutions. His view is that use of metrics improves research evaluation by making it fairer and more objective. He noted that while metrics may not be an option in some areas of arts and humanities, for disciplines where outputs generally appear in indexed journals, bibliometrics were invaluable, concluding that “bibliometrics are to research assessment what diagnostic imaging is to medicine”- i.e. a key source of objective information. 

Funnily enough, I would have agreed with him 12 years ago, when I suggested a simple bibliometric index (departmental H-index) could achieve very similar results to the complex and time-consuming peer review process adopted in the British Research Excellence Framework. At the time I was writing, I thought that Goodhart’s Law ("When a measure becomes a target, it ceases to be a good measure") wouldn’t apply to a citation-based metric, because citations were not controlled by authors, so it would be difficult to game. 

It turns out I was naïve. The crudest method of gaming is excessive self-citation, but there are also citation rings (you cite my paper and I’ll cite yours). This year Maria Ángeles Oviedo-García, René Aquarius and I described a more sophisticated version, a “review mill”, where a group of Italian medics used positions as peer reviewers to coerce citations to work of the group. We suggested that the change in Italian research evaluation, which had been implemented with the best of intentions, had led to cynical gaming of peer review. One might respond by saying that this activity, though disturbing, affects only a tiny proportion of articles and so would not have a detectable effect. Again, I would have agreed a decade ago. But now, with an explosion in publications that seems driven by publishers who are more focused on income than quality (see Hanson et al, 2024), and extraordinarily lax editorial standards,  that may no longer be true. The key point about review mills is that we saw evidence of their activity because they used generic templates for peer reviews, but these can be detected only for journals that publish open peer review – a tiny minority. The most prolific member of the review mill was a journal editor who had nearly 3000 verified peer reviews listed on Web of Science, only a handful of which we could access. 

So I fear that bibliometrics is more like a diagnostic image that has blurred together data from several patients - there is some valid information in there, but it's distorted by error. 

The final presentation by Valentin Bogorov described the future of Scientometrics, where AI would be harnessed to give much more fine-grained and nuanced information about societal impacts of research. But I felt he was ignoring the elephant of fraud that had lumbered into the bibliometric databases. Review mills are one issue for the validity of citation data, but paper mills are a much bigger problem. Whereas review mills rely on the self-organisation of dubious research groups to burnish their reputations, many paper mills are run by outside organisations whose sole motivation is financial profit (Parker et al., 2024). They sell authorships and citations for a price that depends on the Impact Factor of the journal - Eugene Garfield would be turning in his grave. They were first detected about 12 years ago  but have multiplied like a virus and are seriously infecting whole bodies of research. Sometimes they are first recognised when a researcher with subject-domain expertise finds anomalous or fraudulent articles when trying to review the field (see, e.g., Aquarius et al, 2025). 

Paper mills thrive in a warm, soggy environment, where corrupt or incompetent editors will wave through articles that contain clear breaches of scientific method, or even are evidently patched together from various plagiarised articles. The hope of publishers is that AI will provide ways to detect fraudulent papers and remove them before they enter the literature, but paper millers have proved skilled at mutating to evade detection. Unfortunately, the very areas where AI and big data seem to hold most promise, such as databases linking genes, proteins, molecules and biomarkers, already are contaminated.  The fear is that the papermillers will themselves increasingly use AI to create ever more plausible articles. 

I am not opposed to bibliometrics or AI in principle, but I find the optimism about its application to research evaluation concerning, especially since no mention was made of the problems that will emerge if the database interrogated by AI is polluted. Any method of evaluation will have costs, benefits and unforeseen consequences. My concern is that if we focus only on the benefits, we could end up with a system that encourages fraudsters and rewards those who are most skilled at gaming the system rather than the best scientists.  

No comments:

Post a Comment