Thursday, 22 August 2024

Optimizing research integrity investigations: the need for evidence

 

An article was published last week by Caron et al (2024) entitled "The PubPeer conundrum: Administrative challenges in research misconduct proceedings". The authors present a perspective on research misconduct from a viewpoint that is not often heard: three of them are attorneys who advise higher education institutions on research misconduct matters, and the other has served as a Research Integrity Officer at a hospital. 

The authors conclude that the bar for research integrity investigations should be raised, requiring a complaint to reach a higher evidential standard in order to progress, and using a statute of limitations to provide a cutoff date beyond which older research would not usually be investigated. This amounts to saying that the current system is expensive and has bad consequences, so let's change it to do fewer investigations - this will cost less and fewer bad consequences will happen.  The tldr; version of this blogpost is that the argument fails because on the one hand the authors give no indication of the frequency of bad consequences, and on the other hand, they ignore the consequences of failing to act.

How we handle misconduct allegations can be seen as an optimization problem; to solve it, we need two things: data on frequency of different outcomes, and an evaluation of how serious different outcomes are.

We can draw an analogy with a serious medical condition that leads to a variety of symptoms, and which can only be unambiguously diagnosed by an invasive procedure which is both unpleasant and expensive. In such a case, the family doctor will base the decision whether to refer for invasive testing on the basis of information such as physical symptoms or blood test results, and refer the patient for specialist investigations only if the symptoms exceed some kind of threshold. 

The invasive procedure may confirm that the disease is really present, a true positive, or that it is absent, a false positive. Those whose symptoms do not meet a cutoff do not progress to the invasive procedure, but may nevertheless have the disease, i.e., false negatives, or they may be free from the disease, true negatives. The more lenient the cutoff, the more true positives, but the price we pay will be to increase the rate of false positives. Conversely, with a stringent cutoff, we will reduce false positives, but will also miss true cases (i.e. have more false negatives).

Optimization is not just a case of seeking to maximize correct diagnoses - it must also take into account costs and benefits of each outcome. For some common conditions, it is deemed more serious to miss a true case of disease (false negative) than to send someone for additional testing unnecessarily (false positive). Many people feel they would put up with inconvenience, embarrassment, or pain rather than miss a fatal tumour. But some well-established medical screening programmes have been queried or even abandoned on the grounds that they may do more harm than good by creating unnecessary worry or leading to unwarranted medical interventions in people who would be fine left untreated. 

So, how does this analogy relate to research misconduct? The paper by Caron et al emphasizes the two-stage nature of the procedure that is codified in the US by the Office of Science and Technology Policy (OSTP), which is mandatory for federal agencies that conduct or support research. When an allegation of research misconduct is presented to a research institution, it is rather like a patient presenting themselves to a physician: symptoms of misconduct are described, and the research integrity officers must decide whether to proceed to a full investigation - a procedure which is both costly and stressful.

Just as patients will present with symptoms that are benign or trivial, some allegations of misconduct can readily be discounted. They may concern minor matters or be obviously motivated by malice. But there comes a point when the allegations can't be dismissed without a deeper investigation - equivalent to referring the patient for specialist testing. The complaint of Caron et al is that the bar for starting an investigation is specified by the regulator, and is set too low, leading to a great deal of unnecessary investigation. They make it sound rather like the situation that arose with prostate screening in the UK: use of a rather unreliable blood test led to a situation where there was overdiagnosis and overtreatment: in other words, the false positive rate was far too high. The screening programme was eventually abandoned.

My difficulty with this argument is that at no point do Caron et al indicate what the false positive rate is for investigations of misconduct. They emphasize that the current procedures for investigation of misconduct are onerous, both on the institution and on the person under investigation. They note the considerable damage that can be done when a case proves to be a false positive, where an aura of untrustworthiness may hang around the accused, even if they are exonerated. Their conclusion is that the criteria for undertaking an investigation should be made more stringent. This would undoubtedly reduce the rate of false positives, but it would also decrease the true positive detection rate.

One rather puzzling aspect of Caron et al's paper was their focus on the post-publication peer review website PubPeer as the main source of allegations of research misconduct. The impression they gave is that PubPeer has opened the floodgates to accusation of misconduct, many of which have little substance, but which the institutions are forced to respond to because of ORI regulations. This is the opposite of what most research sleuths experience, which is that it is extremely difficult to get institutions to take reports of possible research misconduct seriously, even when the evidence looks strong.

Given these diametrically opposed perspectives, what is needed is hard data on how many reported cases of misconduct proceed to a full investigation, and how many subsequently are found to be justified. And, given the authors' focus on PubPeer, it would be good to see those numbers for allegations that are based on PubPeer comments versus other sources.

There's no doubt that the volume of commenting on PubPeer has increased, but the picture presented by Caron et al seems misleading in implying that most complaints involve concerns such as "a single instance of image duplication in a published paper". Most sleuths who regularly report on PubPeer know that such a single instance is unlikely to be taken seriously; they also know that a researcher who commits research misconduct is often a serial offender, with a pattern of problems across multiple papers. Caron et al note the difficulties that arise when concerns are raised about papers that were published many years ago, where it is unlikely that original data still exist. That is a valid point, but I'd be surprised if research integrity officers receive many allegations via PubPeer based solely on a single paper from years ago; the reason that older papers come to attention is typically because a researcher's more recent work has come into question, which triggers a sleuth to look at other cases. I accept I could be wrong, though. I tend to focus on cases where there is little doubt that misconduct has occurred, and, like many sleuths, I find it frustrating when concerns are not taken seriously, so maybe I underestimate the volume of frivolous or unfounded allegations. If Caron et al want to win me over, they'd have to provide hard data showing how much investigative time is spent on cases that end up being dismissed.

Second, and much harder to estimate, what is the false negative rate: how often are cases of misconduct missed? The authors focus on the sad plight of the falsely accused researcher but say nothing about the negative consequences when a researcher gets away with misconduct. 

Here, the medical analogy may be extended further, because in one important respect, misconduct is less like cancer and more like an infectious disease. It affects all who work with the researcher, particularly younger researchers who will be trained to turn a blind eye to inconvenient data and to "play the game" rather than doing good research. The rot spreads even further: huge amounts of research funding are wasted by others trying to build on noncredible research, and research syntheses are corrupted by the inclusion of unreliable or even fictitious findings. In some high-stakes fields, medical practice or government policy may be influenced by fraudulent work. If we simply make it harder to investigate allegations of misconduct, we run the risk of polluting academic research. And the research community at large can develop a sense of cynicism when they see fraudsters promoted and given grants while honest researchers are neglected.

So, we have to deal with the problem that, currently, fraud pays. Indeed, it is so unlikely to be detected that, for someone with a desire to succeed uncoupled from ethical scruples, it is a more sensible strategy to make up data than to collect it. Research integrity officers may worry now that they are confronted with more accusations of misconduct than they can handle, but if institutions focus on raising the bar for misconduct investigations, rather than putting resources in to tackle the problem, it will only get worse.

In the UK, universities sign up to a Concordat to Support Research Integrity which requires them to report on the number and outcome of research misconduct investigations every year. When it was first introduced, the sense was that institutions wanted to minimize the number of cases reported, as it might be a source of shame.  Now there is growing recognition that fraud is widespread, and the shame lies in failing to demonstrate a robust and efficient approach to tackling it. 


Reference

Caron, M. M., Lye, C. T., Bierer, B. E., & Barnes, M. (2024). The PubPeer conundrum: Administrative challenges in research misconduct proceedings. Accountability in Research, 1–19. https://doi.org/10.1080/08989621.2024.2390007.

Thursday, 8 August 2024

My experience as a reviewer for MDPI

 

Guest post by 

René Aquarius, PhD

Department of Neurosurgery

Radboud University Medical Center, Nijmegen, The Netherlands

 

After a recent zoom-call where Dorothy and I discussed several research-related topics, she invited me to write a guest blogpost about the experience I had as a peer-reviewer for MDPI. As I think transparency in research is important, I was happy to accept this invitation.  

 

Mid November 2023 I received a request to peer-review a manuscript for a special issue on subarachnoid hemorrhage for the Journal of Clinical Medicine, published by MDPI. This blog post summarizes that process. I hope it will give some insight on the nitty-gritty of the peer-review process for MDPI.

 

I decided to review the manuscript two days after receiving the invitation and what I found was a study like many others in the field: a single-center, retrospective analysis of a clinical case series. I ended up recommending rejection of the paper two days after accepting to review. My biggest gripes were that the authors claimed that data were collected prospectively, but their protocol was registered at the very end of the period in which they included patients. In addition, I discovered some important discrepancies between protocol and the final study. Target sample size according to the protocol was 50% bigger than what was used in their study. The minimum age for patients also differed between the protocol and the manuscript. I also had problems with their statistical analysis as they used more than 20 t-tests to test variables, which creates a high probability of Type I errors. The biggest problem was the lack of a control group, which made it impossible to establish whether changes in a physiological parameter could really predict intolerance for a certain drug in a small subset of patients.

 

When filling out the reviewer form for MDPI, certain aspects struck me as peculiar. There are four options for Overall Recommendation:

  • Accept in present form
  • Accept after minor revision (correction to minor methodological errors and text editing)
  • Reconsider after major revision (control missing in some experiments)
  • Reject (article has serious flaws, additional experiments needed, research not conducted correctly)

 

Regardless of which of the last two options you select, the response is: "If we ask the authors to revise the manuscript, the revised manuscript will be sent to you for further evaluation". 

 

Although reviewer number 2 is often jokingly referred to as "the difficult one" it couldn’t be further from the truth in this case. The reviewer liked the paper and recommended accept after minor revision. So with a total of two reviews, the paper got the editorial decision of rejected, with a possibility of resubmission after extensive revisions only one day after I handed in my peer review report.

 

Revisions were quite extensive, as you will discover below, and arrived only two days after the initial rejection. I agreed to review the revised manuscript. But before I could start my review of the revision, just four days after receiving the invitation, I received a response from the editorial office that my review was no longer needed because they already had enough peer-reviewers for the manuscript. I politely ignored this request, because I wanted to know if the manuscript had improved. What happened next was quite a bit of a surprise, but not in a good way. 

 

The manuscript had indeed undergone extensive revisions. The biggest change, however, was also the biggest red flag. Without any explanation the study had lost almost 20% of its participants. An additional problem was that all the issues I had raised in my previous review report remained unaddressed. I sent my newly written feedback report the same day, exactly one week after my initial rejection.

 

When I handed in my second review report, I understood why I initially got an email that my review was not needed anymore. One peer reviewer had also rejected the manuscript and had concerns similar to mine. Two other reviewers, however, accepted the manuscript. One with minor revisions (English needed some improvement) and one in present form, so without any suggested revisions. This means that if I had followed the advice of the editorial office of MDPI, the paper would probably have been accepted in its current form. But because my vote was now also cast and the paper received two rejections, the editor couldn’t do much more than to reject the manuscript, which happened three days after I handed in my review report.  

 

Fifteen days after receiving my first invitation to review, the manuscript had already seen two full rounds of peer-review by at least four different peer-reviewers.

 

This is not where the story ends.  

 

In December, about a month later, I received an invitation to review a manuscript for the MDPI journal Geriatrics. You’ve guessed it by now: it was the same manuscript. It's reasonable to assume this was shifted internally through MDPI's transfer service, summarised in this figure.  I can only speculate as to why I was still attached to the manuscript as a peer-reviewer, but I guess somebody forgot to remove my name from it.

from: https://www.mdpi.com/authors/transfer-service

The manuscript had, again, transformed. It was now very similar to the very first version I reviewed. Almost word-for-word similar. That also meant that the number of included patients was restored to the initial number. However, the registered protocol that was previously mentioned in the methods section (which had led to some of the most difficult to refute critiques) was now completely left out. The icing on the cake was that, for a reason that was not explained, another author was added to the manuscript. There was no mention in this invitation of the previous reviews and rejections of the same manuscript.   Although one might wonder whether MDPI editors were aware of this, it would be strange if they were not, since they pride themselves on their Susy manuscript submission system where "editors can easily track concurrent and previous submissions from the same authors".

 

Because the same issues were still present in the manuscript, I rejected it for a third time on the same day I agreed to review it. In an accompanying message to the editor, I clearly articulated my problems with the manuscript and the review process.

 

The week after, I received a message that the editor had decided to withdraw the manuscript in consultation with the authors.

 

Late January 2024, the manuscript was published in the MDPI journal Medicina. I was not attached to the manuscript any more as a reviewer. There was no indication on the website of the name of the acting editor who accepted it. 


Note from Dorothy Bishop

Comments on this blog are moderated so there may be some delay before they appear, but legitimate, on-topic contributions are welcomed. We would be particularly interested to hear from anyone else who has experiences, good or bad, as a reviewer for MDPI journals.

 

Postscript by Dorothy Bishop: 19 Aug 2024 

Here's an example of a paper that was published with the reviews visible. Two were damning and one was agreeable.  https://www.mdpi.com/2079-6382/9/12/868.  Thanks to @LymeScience for drawing our attention to this, and noting the important clinical consequences when those promoting an alternative, non-evidenced treatment have a "peer-reviewed" study to refer to.