BishopBlog: Optimizing research integrity investigations: the need for evidence

An article was published last week by Caron et al (2024) entitled "The PubPeer conundrum: Administrative challenges in research misconduct proceedings". The authors present a perspective on research misconduct from a viewpoint that is not often heard: three of them are attorneys who advise higher education institutions on research misconduct matters, and the other has served as a Research Integrity Officer at a hospital.

The authors conclude that the bar for research integrity investigations should be raised, requiring a complaint to reach a higher evidential standard in order to progress, and using a statute of limitations to provide a cutoff date beyond which older research would not usually be investigated. This amounts to saying that the current system is expensive and has bad consequences, so let's change it to do fewer investigations - this will cost less and fewer bad consequences will happen. The tldr; version of this blogpost is that the argument fails because on the one hand the authors give no indication of the frequency of bad consequences, and on the other hand, they ignore the consequences of failing to act.

How we handle misconduct allegations can be seen as an optimization problem; to solve it, we need two things: data on frequency of different outcomes, and an evaluation of how serious different outcomes are.

We can draw an analogy with a serious medical condition that leads to a variety of symptoms, and which can only be unambiguously diagnosed by an invasive procedure which is both unpleasant and expensive. In such a case, the family doctor will base the decision whether to refer for invasive testing on the basis of information such as physical symptoms or blood test results, and refer the patient for specialist investigations only if the symptoms exceed some kind of threshold.

The invasive procedure may confirm that the disease is really present, a true positive, or that it is absent, a false positive. Those whose symptoms do not meet a cutoff do not progress to the invasive procedure, but may nevertheless have the disease, i.e., false negatives, or they may be free from the disease, true negatives. The more lenient the cutoff, the more true positives, but the price we pay will be to increase the rate of false positives. Conversely, with a stringent cutoff, we will reduce false positives, but will also miss true cases (i.e. have more false negatives).

Optimization is not just a case of seeking to maximize correct diagnoses - it must also take into account costs and benefits of each outcome. For some common conditions, it is deemed more serious to miss a true case of disease (false negative) than to send someone for additional testing unnecessarily (false positive). Many people feel they would put up with inconvenience, embarrassment, or pain rather than miss a fatal tumour. But some well-established medical screening programmes have been queried or even abandoned on the grounds that they may do more harm than good by creating unnecessary worry or leading to unwarranted medical interventions in people who would be fine left untreated.

So, how does this analogy relate to research misconduct? The paper by Caron et al emphasizes the two-stage nature of the procedure that is codified in the US by the Office of Science and Technology Policy (OSTP), which is mandatory for federal agencies that conduct or support research. When an allegation of research misconduct is presented to a research institution, it is rather like a patient presenting themselves to a physician: symptoms of misconduct are described, and the research integrity officers must decide whether to proceed to a full investigation - a procedure which is both costly and stressful.

Just as patients will present with symptoms that are benign or trivial, some allegations of misconduct can readily be discounted. They may concern minor matters or be obviously motivated by malice. But there comes a point when the allegations can't be dismissed without a deeper investigation - equivalent to referring the patient for specialist testing. The complaint of Caron et al is that the bar for starting an investigation is specified by the regulator, and is set too low, leading to a great deal of unnecessary investigation. They make it sound rather like the situation that arose with prostate screening in the UK: use of a rather unreliable blood test led to a situation where there was overdiagnosis and overtreatment: in other words, the false positive rate was far too high. The screening programme was eventually abandoned.

My difficulty with this argument is that at no point do Caron et al indicate what the false positive rate is for investigations of misconduct. They emphasize that the current procedures for investigation of misconduct are onerous, both on the institution and on the person under investigation. They note the considerable damage that can be done when a case proves to be a false positive, where an aura of untrustworthiness may hang around the accused, even if they are exonerated. Their conclusion is that the criteria for undertaking an investigation should be made more stringent. This would undoubtedly reduce the rate of false positives, but it would also decrease the true positive detection rate.

One rather puzzling aspect of Caron et al's paper was their focus on the post-publication peer review website PubPeer as the main source of allegations of research misconduct. The impression they gave is that PubPeer has opened the floodgates to accusation of misconduct, many of which have little substance, but which the institutions are forced to respond to because of ORI regulations. This is the opposite of what most research sleuths experience, which is that it is extremely difficult to get institutions to take reports of possible research misconduct seriously, even when the evidence looks strong.

Given these diametrically opposed perspectives, what is needed is hard data on how many reported cases of misconduct proceed to a full investigation, and how many subsequently are found to be justified. And, given the authors' focus on PubPeer, it would be good to see those numbers for allegations that are based on PubPeer comments versus other sources.

There's no doubt that the volume of commenting on PubPeer has increased, but the picture presented by Caron et al seems misleading in implying that most complaints involve concerns such as "a single instance of image duplication in a published paper". Most sleuths who regularly report on PubPeer know that such a single instance is unlikely to be taken seriously; they also know that a researcher who commits research misconduct is often a serial offender, with a pattern of problems across multiple papers. Caron et al note the difficulties that arise when concerns are raised about papers that were published many years ago, where it is unlikely that original data still exist. That is a valid point, but I'd be surprised if research integrity officers receive many allegations via PubPeer based solely on a single paper from years ago; the reason that older papers come to attention is typically because a researcher's more recent work has come into question, which triggers a sleuth to look at other cases. I accept I could be wrong, though. I tend to focus on cases where there is little doubt that misconduct has occurred, and, like many sleuths, I find it frustrating when concerns are not taken seriously, so maybe I underestimate the volume of frivolous or unfounded allegations. If Caron et al want to win me over, they'd have to provide hard data showing how much investigative time is spent on cases that end up being dismissed.

Second, and much harder to estimate, what is the false negative rate: how often are cases of misconduct missed? The authors focus on the sad plight of the falsely accused researcher but say nothing about the negative consequences when a researcher gets away with misconduct.

Here, the medical analogy may be extended further, because in one important respect, misconduct is less like cancer and more like an infectious disease. It affects all who work with the researcher, particularly younger researchers who will be trained to turn a blind eye to inconvenient data and to "play the game" rather than doing good research. The rot spreads even further: huge amounts of research funding are wasted by others trying to build on noncredible research, and research syntheses are corrupted by the inclusion of unreliable or even fictitious findings. In some high-stakes fields, medical practice or government policy may be influenced by fraudulent work. If we simply make it harder to investigate allegations of misconduct, we run the risk of polluting academic research. And the research community at large can develop a sense of cynicism when they see fraudsters promoted and given grants while honest researchers are neglected.

So, we have to deal with the problem that, currently, fraud pays. Indeed, it is so unlikely to be detected that, for someone with a desire to succeed uncoupled from ethical scruples, it is a more sensible strategy to make up data than to collect it. Research integrity officers may worry now that they are confronted with more accusations of misconduct than they can handle, but if institutions focus on raising the bar for misconduct investigations, rather than putting resources in to tackle the problem, it will only get worse.

In the UK, universities sign up to a Concordat to Support Research Integrity which requires them to report on the number and outcome of research misconduct investigations every year. When it was first introduced, the sense was that institutions wanted to minimize the number of cases reported, as it might be a source of shame. Now there is growing recognition that fraud is widespread, and the shame lies in failing to demonstrate a robust and efficient approach to tackling it.

Reference

Caron, M. M., Lye, C. T., Bierer, B. E., & Barnes, M. (2024). The PubPeer conundrum: Administrative challenges in research misconduct proceedings. Accountability in Research, 1–19. https://doi.org/10.1080/08989621.2024.2390007.

3 comments:

Leonid Schneider22 August 2024 at 14:48
Observe these two statements from Caron et al 2024:
"Barbara E. Bierer, MD, formerly served as senior vice president of research and the Research Integrity Officer at the Brigham and Women’s Hospital and often serves as a committee member in research misconduct proceedings."
"More recently, one scientist submitted a letter to PubPeer requesting that certain comments made in “bad faith” be removed from PubPeer that the scientist claims have been made to harass him and his colleagues (Joelving 2023). Such use of PubPeer for pure harassment may not be an experience unique to this one scientist, and the functioning of PubPeer would certainly seem to allow for such malicious use of its platform."

The reference goes to Retraction Watch's criticism of my reporting about the issues with Joseph Loscalzo's papers (https://forbetterscience.com/2023/10/23/joe-loscalzos-drag-show/).
Loscalzo is professor at Brigham & Women's. Bierer just admitted that she dismissed all evidence as "pure harassment".
Leonid Schneider22 August 2024 at 14:52
Also, Sholto David noted an omission in COI statement: "In April 2024 Mark Barnes was acting research integrity officer for MD Anderson. "
Leonid Schneider28 August 2024 at 14:43
"If Caron et al want to win me over, they'd have to provide hard data showing how much investigative time is spent on cases that end up being dismissed. "
Most cases get dismissed, from my experience. Because paper too old, raw data unavailable, or lead author found not responsible, or scientific conclusions are unaffected, or the notifier gets charged with malicious slander.

Thursday, 22 August 2024

Optimizing research integrity investigations: the need for evidence

3 comments: