Monday, 2 February 2026

An analysis of PubPeer comments on highly-cited retracted articles

PubPeer is sometimes discussed as if it is some kind of cesspit where people smear honest scientists with specious allegations of fraud. I'm always taken aback when I hear this, since it is totally at odds with my experience. When I conducted an analysis of PubPeer comments concerning papers from UK universities published over a two-year period, I found that all 345 of them conformed to PubPeer's guidelines, which require comments to contain only "Facts, logic and publicly verifiable information". There were examples where another commenter, sometimes an author, rebutted a comment convincingly. In other cases, the discussion concerned highly technical aspects of research, where even experts may disagree. Clearly, PubPeer comments are not infallible evidence of problems, but in my experience, they are strictly moderated and often draw attention to serious errors in published work.

The Problematic Paper Screener (PPS) is a beautiful resource that is ideal to investigate PubPeer's impact. It not only collates information on articles that are annulled (an umbrella term coined to encompass retractions, removals, or withdrawals), but it also cross-references this information with PubPeer, so you can see which articles have comments. Furthermore, it provides the citation count of each article, based on Dimensions.  

The PPS lists over 134,000 annulled papers; I wanted to see what proportion of retractions/withdrawals were preceded by a PubPeer comment. To make the task tractable, I focused on articles that had at least 100 citations, and which were annulled between 2021 and 2025. This gave a total of 800 articles, covering all scientific disciplines. It was necessary to read the PubPeer comments for each of these, because many comments occur after retraction, and serve solely to record the retraction on PubPeer. Accordingly, I coded each paper in terms of whether the first PubPeer comment preceded or followed the annulment.  

Flowchart of analysis of PPS annulled papers
 

I had anticipated that around 10-20% of these annulled articles would have associated PubPeer comments; this proved to be a considerable underestimate. In fact, 58% of highly-cited papers that were annulled between 2021-2025 had prior PubPeer comments. Funnily enough, shortly after I'd started this analysis, I saw this comment on Slack by Achal Agrawal: "I was wondering if there is any study on what percentage of retractions happen thanks to sleuths. I have a feeling that at least around 50% of the retractions happen thanks to the work of 10 sleuths." Achal's estimate of the percentage of flagged papers was much closer than mine. But what about the number of sleuths who were responsible?

It's not possible to give more than a rough estimate of the contribution of individual commenters. Many of them use pseudonyms (some people even use a different pseudonym for each post they submit), and combinations of individuals often contributed comments on a single article. Some of the PubPeer comments had been submitted in early years, when they were just labelled as "Unregistered submission" or "Peer 1" etc., so any estimate will be imperfect. The best I could do was to focus just on the first comment for each article, excluding any comments occurring after a retraction. Of those who had stable names or pseudonyms, the 10 most prolific commenters had commented on between 9 and 50 articles, accounting for 27% of all retractions in this sample. Although this is a lower proportion than Achal's estimate, it's an impressive number, especially when you bear in mind that there were many comments from unknown contributors, and the analysis focused only on articles with at least 100 citations.

Of course, the naysayers may reply and say that this just goes to show that the sleuths who comment on articles are effective in causing retractions, not that they are accurate. To that I can only reply that publishers/journals are very reluctant to retract articles: they may regard it as reputationally damaging, and be concerned about litigation from disgruntled authors. In addition, they have to go through due process and it takes up a lot of resources to make the necessary checks and modify the publication record. They don't do it lightly, and often don't do it at all, despite clear evidence of serious error in an article (see, e.g.  Grey et al, 2025)

If an article is going to be retracted, it is better that it is done sooner rather than later. Monitoring PubPeer would be a good way of cleaning up a polluted literature - in the interests of all of us. Any publisher can do that for free: just ask an employee of the integrity department to check new PubPeer posts every day—about 40 minutes and you’re done. PubPeer also provides publishers with a convenient dashboard to facilitate this essential monitoring task.

It would be interesting to extend the analysis to less highly-cited papers, but this would be a huge exercise, particularly since this would include many paper-milled articles from mass retractions. I hope that my selective analysis will at least demonstrate that those who comment on problematic articles on PubPeer should be taken seriously. 

 

Post-script: 7 February 2026

One of the commentators with numbered comments below has complained that I am censoring criticism, and has revealed their identity on LinkedIn as Ryan James Jessup, JD/MPA.  My bad - I usually paste a statement at the end of a blogpost explaining that Comments are moderated so there can be a delay, but I accept nonanonymous comments that are polite and on topic.  Jessup didn't take up my offer of incorporating his arguments in a section at the end of the blog, so I have accepted them and you can read them in the Comments.

I actually agree with a lot of what he says, but some points I disagree with, so here are my thoughts.

Points 1-2. He starts by stating the piece confuses correlation with causation.  On reflection I think he's right. The word "role" in the title is misleading, and I have accordingly changed the title of the post from "The role of PubPeer in retractions of highly-cited articles" to "An analysis of PubPeer comments on highly-cited articles".

3.  He argues that selection of highly-cited papers was done to fudge the result because these papers are most likely to be noticed and commented on.  The  actual reason for selecting these papers was to focus on outputs that had had some influence; many people assume PubPeer commentators just focus on the low-hanging fruit from papermills, which nobody is going to read anyhow. There is nothing to stop Jessup or anyone else doing his own analysis using another filter to see if these results generalise to less highly-cited articles. It involves just a few hours of rather tedious coding. Maybe sample a random 800 articles?  

4. He argues that "annulled" papers covers various categories.  I am glad to be able to clarify that in the sample of 800 papers that I analysed, all were retractions.

5. He disagrees that my opinion of whether PubPeer comments were factual and accurate has any value, and that they could be defamatory or otherwise falsely imply misconduct.  From my experience, I reckon it would be difficult to get such material past PubPeer moderators, but if he can provide some examples, that would be helpful.  

6. He says the coding method is subjective "They read comments and decide whether the first comment preceded or followed annulment".  The dates are provided for the retraction notice in the PPS, so this is just a matter of checking if the PubPeer comment (also dated) appeared before or after that date.

7. Re the identification of "top 10 sleuths".  I noted the limitations inherent in the data, so I am not sure what Jessup is complaining of here. The fact remains that a small number of individuals have been very effective in identifying issues in highly-cited articles prior to their retraction.

8.  Jessup argues that I'm saying that “journals don’t retract lightly, therefore PubPeer must be right”.  The first part of that argument has ample evidence. If he is aware of cases where PubPeer comments have indeed led to inappropriate retractions, then he should name them.

9-11. I do actually have some understanding of how retraction processes work in journals, but my concern is the failure of many journals/publishers to initiate the first step in the process.  I think we're in agreement that the current system for retracting articles from journals is broken. We also agree that PubPeer comments should be regarded as tips. My suggestion is simply that if publishers have a useful free source of tips, they should use it. A few of them do, but many don't seem motivated to be proactive because it just creates more work.

The prolific PubPeer commenters that I know would love it if the platform could be used primarily for civilised academic debate, as was the original intention. Unfortunately, science can't wait until the broken system is repaired; we do need to clean up a polluted literature. I would add that the idea that those who comment on PubPeer are doing it for the glory is laughable. The main reaction is to be ignored at best and abused at worst. They are unusual people who are obsessive about the need to have a reliable scientific literature.

 

 

 

 

15 comments:

  1. I think EMBO Journal and/or EMBO reports do monitor Pubpeer and act rather quickly if something is flagged. (But I also think they are an exception to the rule.)

    ReplyDelete
  2. Frontiers has taken to posting an acknowledgment that they are aware of the PubPeer thread whenever it concerns one of their papers. So they are at least aware of these.

    ReplyDelete
  3. This piece is a masterclass in confusing correlation with causation, then using that confusion to launder a weak argument into a policy recommendation.

    ReplyDelete
  4. 1. It treats PubPeer as an engine of retractions when it’s mostly a mirror of controversy
    “58% had prior PubPeer comments” does not mean PubPeer caused anything. High-profile, highly-cited papers attract scrutiny everywhere: post-publication review, journal correspondence, institutional whispers, conference chatter, whistleblowers, competitors, and internal audits. PubPeer is one visible venue where that scrutiny can appear. Visibility is not causality.

    ReplyDelete
  5. 2. The sample is engineered to inflate PubPeer’s “impact”
    Filtering to 100+ citations and annulled between 2021–2025 is not “tractable,” it’s a selection filter that concentrates exactly the papers most likely to (a) be noticed publicly and (b) eventually receive comments somewhere. Highly-cited papers are, by definition, more visible, more discussed, and more likely to generate public annotations. This is like “proving” TMZ predicts celebrity divorces by only studying A-listers who divorced.

    ReplyDelete
  6. 3. “Annulled papers” is rhetorical packaging
    Collapsing retractions, withdrawals, removals, etc. into “annulled” makes the dataset sound unified while hiding heterogeneous reasons and processes. Those categories differ in cause, evidentiary standards, timelines, and institutional pathways. Bundling them lets the author claim a cleaner signal than reality supports.

    ReplyDelete
  7. . The moderation claim is a credibility costume, not evidence
    “All 345 comments conformed to guidelines” is not the same as “accurate,” “fair,” or “non-defamatory.” “Facts, logic, publicly verifiable information” still allows:

    • cherry-picked image comparisons
    • insinuation by accumulation
    • technical nitpicking framed as misconduct
    • adversarial interpretation presented as certainty
    Also: “my experience” is not a reliability metric. If the paper’s thesis is “PubPeer is effective,” the burden is to quantify false positives, not to assert moderation.

    ReplyDelete
  8. 5. The coding method is shaky and the key step is subjective
    They read comments and decide whether the “first comment preceded or followed annulment,” while admitting many comments exist solely to record retractions after the fact. That is already a sign PubPeer is often archival, not catalytic. Then they keep only the first comment and interpret that as meaningful. First comment timing is not a causal signal. It’s a timestamp.

    ReplyDelete
  9. 6. The “10 sleuths” section is basically statistical fan fiction
    They admit identities are unstable, pseudonyms shift, early comments are anonymous, and multiple people contribute. Then they still produce a “top 10 commenters” estimate and treat it as “impressive.” That’s not analysis. That’s a leaderboard made out of incomplete identity data, and it’s exactly the kind of gamification that turns integrity into performance.

    ReplyDelete
  10. 7. The argument relies on a myth: “journals don’t retract lightly, therefore PubPeer must be right”
    Journals retract for many reasons, including risk management, reputation management, external pressure, and institutional findings. “Reluctant to retract” is not proof that any given trigger is accurate. It’s proof that the process is slow and political. The paper uses the slowness of due process as a seal of truth for a platform that does not run due process.

    ReplyDelete
  11. 8. The policy prescription is naïve and reveals the author doesn’t understand operations
    “Check PubPeer every day—40 minutes and you’re done” is laughable. Real integrity work is:

    • triage and scoping
    • evidence capture and preservation
    • author queries and response windows
    • reviewer/editor consultation
    • institutional contact
    • legal review and defamation risk
    • documentation, decision memos, and publication-record edits
    Monitoring PubPeer is not “cleaning up literature.” It’s collecting tips. Tips are cheap. The expensive part is adjudication.

    ReplyDelete
  12. 9. The piece quietly normalizes a broken model: outsource enforcement to anonymous crowds, then congratulate yourself
    This is the core problem. PubPeer is not publishing governance. It’s external commentary. You can’t run integrity on vibes, screenshots, and social pressure. If you do, you get inconsistency, selective enforcement, and reputational punishment without accountable procedure.

    ReplyDelete
  13. 10. What it should have concluded, but won’t
    PubPeer can be a useful intake channel. That’s it. A tip line. A signal. Not a system. If publishers want fewer retractions and less liability, they don’t need a “dashboard.” They need competent internal investigations: trained investigators, evidence standards, documented procedures, and legal oversight. Treating PubPeer monitoring as the solution is the same category error as treating headlines as law enforcement.

    ReplyDelete
  14. "PubPeer can be a useful intake channel. That’s it. A tip line. A signal. Not a system. If publishers want fewer retractions and less liability, they don’t need a “dashboard.” They need competent internal investigations: trained investigators, evidence standards, documented procedures, and legal oversight." I happen to agree with these last comments. Same for sleuths in general. The cannot be treated as a system or used instead of a proper system. They are as much a sign of a completely dysfunctional system, as part of a solution. In other words, in a functional system pubpeer or sleuths would not be needed. The problem is that we will never get to a functioning system with universities protecting their cheating/data-fudging cash-cows, journals not doing any retroactive or prospective screening for problematic papers, and granting agencies not doing any spotchecks nor instituting rigorous quality controls for their awardees. So this is what's lefts: pubpeer, sleuths, a hit-and-miss system - and the majority of fishy papers still in the literature, undected or perhaps pubpeer-flagged but not corrected or retracted.

    ReplyDelete
    Replies
    1. As you'll see from my replies to Anonymous in PS above, I also agree. But you misunderstand the dashboard. If you are a publisher, you don't need to wade through all the latest PubPeer comments on everything; dashboard lets you selectively look at just those relating to a specific journal/institution/publisher so is a time-saving method for keeping an eye on things.

      Delete