Tuesday, 5 December 2023

Low-level lasers. Part 2. Erchonia and the universal panacea

 

 


In my last blogpost, I looked at a study that claimed continuing improvements of symptoms of autism after eight 5-minute sessions where a low-level laser was pointed at the head.  The data were so extreme that I became interested in the company, Erchonia, who sponsored the study and in Regulatory Insight, Inc, whose statistician failed to notice anything odd.  In exploring Erchonia's research corpus, I found that they have investigated the use of their low-laser products for a remarkable range of conditions. A search of clinicaltrials.com with the keyword Erchonia produced 47 records, describing studies of pain (chronic back pain, post-surgical pain, and foot pain), body contouring (circumference reduction, cellulite treatment), sensorineural hearing loss, Alzheimer's disease, hair loss, acne and toenail fungus. After excluding the trials on autism described in my previous post, fourteen of the records described randomised controlled trials in which an active laser was compared with a placebo device that looked the same, with both patient and researcher being kept in the dark about which device was which until the data were analysed. As with the autism study, the research designs for these RCTs specified on clinicaltrials.com looked strong, with statistician Elvira Cawthon from Regulatory Insight involved in data analysis.

As shown in Figure 1, where results are reported for RCTs, they have been spectacular in virtually all cases. The raw data are mostly not available, and in general the plotted data look less extreme than in the autism trial covered in last week's post, but nonetheless, the pattern is a consistent one, where over half the active group meet the cutoff for improvement, whereas less than half (typically 25% or less) of the placebo group do so. 

FIGURE 1: Proportions in active treated group vs placebo group meeting preregistered criterion for improvement (Error bars show SE)*

I looked for results from mainstream science against which to benchmark the Erchonia findings.  I found a big review of behavioural and pharmaceutical interventions for obesity by the US Agency for Healthcare Research and Quality (LeBlanc et al, 2018). Figures 7 and 13 show results for binary outcomes - relative risk of losing 5% or more of body weight over a 12 month period; i.e. the proportion of treated individuals who met this criterion divided by the proportion of controls. In 38 trials of behavioural interventions, the mean RR was 1.94 [95% CI, 1.70 to 2.22]. For 31 pharmaeutical interventions, the effect varied with the specific medication, with RR ranging from 1.18 to 3.86. Only two pharmaceutical comparisons had RR in excess of 3.0. By contrast, for five trials of body contouring or cellulite reduction from Erchonia, the RRs ranged from 3.6 to 18.0.  Now, it is important to note that this is not comparing like with like: the people in the Erchonia trials were typically not clinically obese: they were mostly women seeking cosmetic improvements to their appearance.  So you could, and I am sure many would, argue it's an unfair comparison. If anyone knows of another literature that might provide a better benchmark, please let me know. The point is that the effect sizes reported by Erchonia are enormous relative to the kinds of effects typically seen with other treatments focused on weight reduction.

If we look more generally at the other results obtained with low-level lasers, we can compare them to an overview of effectiveness of common medications (Leucht et al, 2015). These authors presented results from a huge review of different therapies, with effect sizes represented as standardized mean differences (SMD - familiar to psychologists as Cohen's d). I converted Erchonia results into this metric*, and found that across all the studies of pain relief shown in Figure 1, the average SMD was 1.30, with a range from 0.87 to 1.77. This contrasts with Leucht et al's estimated effect size of 1.06 for oxycodone plus paracetamol, and 0.83 for Sumatriptan for migraine.  So if we are to believe the results, they indicate that the effect of Erchonia low-level lasers is as good or better than the most effective pharmaceutical medications that we have for pain relief or weight loss. I'm afraid I remain highly sceptical.

I would not have dreamed of looking at Erchonia's track record if it were not for their impossibly good results in the Leisman et al autism trial that I discussed in the previous blogpost.  When I looked in more detail, I was reminded of the kinds of claims made for alternative treatments for children's learning difficulties, where parents are drawn in with slick websites promising scientifically proven interventions, and glowing testimonials from satisfied customers. Back in 2012 I blogged about how to evaluate "neuroscientific" interventions for dyslexia.  Most of the points I made there apply to the world of "photomodulation" therapies, including the need to be wary when a provider claims that a single method is effective for a whole host of different conditions.  

Erchonia products are sold worldwide and seem popular with alternative health practitioners. For instance, in Stockport, Manchester, you can attend a chiropractic clinic where Zerona laser treatment will remove "stubborn body fat". In London there is a podiatry centre that reassures you: "There are numerous papers which show that cold laser affects the activity of cells and chemicals within the cell. It has been shown that cold laser can encourage the formation of stem cells which are key building blocks in tissue reparation. It also affects chemicals such as cytochrome c and causes a cascade of reactions which stimulates the healing. There is much research to show that cold laser affects healing and there are now several very good class 1 studies to show that laser can be effective." But when I looked for details of these "very good class 1 studies" they were nowhere to be found. In particular, it was hard to find research by scientists without vested interests in the technology.  

Of all the RCTs that I found, there were just two that were conducted at reputable universities. One of them, on hearing loss (NCT01820416) was conducted at the University of Iowa, but terminated prematurely because intermediate analysis showed no clinically or statistically significant effects (Goodman et al., 2013).  This contrasts sharply with NCT00787189, which had the dramatic results reported in Figure 1 (not, as far as I know, published outside of clinicaltrials.gov). The other university-based study was the autism study based in Boston described in my previous post: again, with unpublished, unimpressive results posted on clinicaltrials.gov.

This suggests it is important when evaluating novel therapies to have results from studies that are independent of those promoting the therapy. But, sadly, this is easier to recommend than to achieve. Running a trial takes a lot of time and effort: why would anyone do this if they thought it likely that the intervention would not work and the postulated mechanism of action was unproven? There would be a strong risk that you'd end up putting in effort that would end in a null result, which would be hard to publish. And you'd be unlikely to convince those who believed in the therapy - they would no doubt say you had the wrong wavelength of light, or insufficient duration of therapy, and so on.  

I suspect the response by those who believe in the power of low-level lasers will be that I am demonstrating prejudice, in my reluctance to accept the evidence that they provide of dramatic benefits. But, quite simply, if low-level laser treatment was so remarkably effective in melting fat and decreasing pain, surely it would have quickly been publicised through word of mouth from satisfied customers. Many of us are willing to subject our bodies to all kinds of punishments in a quest to be thin and/or pain-free. If this could be done simply and efficiently without the need for drugs, wouldn't this method have taken over the world?

*Summary files (Erchonia_proportions4.csv) and script (Erchonia_proportions_for_blog.R) are on Github, here.

Saturday, 25 November 2023

Low-level lasers. Part 1. Shining a light on an unconventional treatment for autism


 

'Light enters, then a miracle happens, and good things come out!' (Quirk & Whelan, 2011*)



I'm occasionally asked to investigate weird interventions for children's neurodevelopmental conditions, and recently I've found myself immersed in the world of low-level laser treatments. The material I've dug up is not new - it's been around for some years, but has not been on my radar until now. 

A starting point is this 2018 press statement by Erchonia, a firm that makes low-level laser devices for quasi-medical interventions. 

They had tested a device that was supposed to reduce irritability in autistic children by applying low-level laser light to the temporal and posterior regions of the head (see Figure 1) for 5 minute sessions twice a week for 4 weeks.

Figure 1: sites of stimulation by low-level laser

 The study, which was reported here, was carefully designed as a randomized controlled trial. Half the children received a placebo intervention. Placebo and active laser devices were designed to look identical and both emitted light, and neither the child nor the person administering the treatment knew whether the active or placebo light was being used.

According to Erchonia “The results are so strong, nobody can argue them.” (sic). Alas, their confidence turned out to be misplaced.

The rationale given by Leisman et al (with my annotations in yellow in square brackets) is as follows: "LLLT promotes cell and neuronal repair (Dawood and Salman 2013) [This article is about wound healing, not neurons] and brain network rearrangement (Erlicher et al. 2002) [This is a study of rat cells in a dish] in many neurologic disorders identified with lesions in the hubs of default mode networks (Buckner et al. 2008)[This paper does not mention lasers]. LLLT facilitates a fast-track wound-healing (Dawood and Salman 2013) as mitochondria respond to light in the red and near-infrared spectrum (Quirk and Whelan 2011*)[review of near-infrared irradiation photobiomodulation that notes inadequate knowledge of mechanisma - see cartoon]. On the other hand, Erlicher et al. (2002) have demonstrated that weak light directs the leading edge of growth cones of a nerve [cells in a dish]. Therefore, when a light beam is positioned in front of a nerve’s leading edge, the neuron will move in the direction of the light and grow in length (Black et al. 2013 [rat cells in a dish]; Quirk and Whelan 2011). Nerve cells appear to thrive and grow in the presence of low-energy light, and we think that the effect seen here is associated with the rearrangement of connectivity."

I started out looking at the registration of the trial on ClinicalTrials.gov. This included a very thorough document that detailed a protocol and analysis plan, but there were some puzzling inconsistencies; I documented them here on PubPeer,  and subsequently a much more detailed critique was posted there by Florian Naudet and André Gillibert. Among other things, there was confusion about where the study was done. The registration document said it was done in Nazareth, Israel, which is where the first author, Gerry Leisman was based. But it also said that the PI was Calixto Machado, who is based in Havana, Cuba.

Elvira Cawthon, from Regulatory Insight, Inc, Tennessee was mentioned on the protocol as clinical consultant and study monitor. The role of the study monitor is specified as follows: 

"The study Monitor will assure that the investigator is executing the protocol as outlined and intended. This includes insuring that a signed informed consent form has been attained from each subject’s caregiver prior to commencing the protocol, that the study procedure protocol is administered as specified, and that all study evaluations and measurements are taken using the specified methods and correctly and fully recorded on the appropriate clinical case report forms."

This does not seem ideal, given that the study monitor was in Tennessee, and the study was conducted in either Nazareth or Havana. Accordingly, I contacted Ms Cawthon, who replied: 

"I can confirm that I performed statistical analysis on data from the clinical study you reference that was received from paper CRFs from Dr. Machado following completion of the trial. I was not directly involved in the recruitment, treatment, or outcomes assessment of the subjects whose data was recorded on those CRFs. I have not reviewed any of the articles you referenced below so I cannot attest to whether the data included was based on the analyses that I performed or not or comment on any of the discrepancies without further evaluation at this time."

I had copied Drs Leisman and Machado into my query, and Dr Leisman also replied. He stated:

"I am the senior author of the paper pertaining to a trial of low-level laser therapy in autism spectrum disorder.... I take full responsibility for the publication indicated above and vouch for having personally supervised the implementation of the project whose results were published under the following citation:

Leisman, G. Machado, C., Machado, Y, Chinchilla-Acosta, M. Effects of Low-Level Laser Therapy in Autism Spectrum Disorder. Advances in Experimental Medicine and Biology 2018:1116:111-130. DOI:10.1007/5584_2018_234. The publication is referenced in PubMed as: PMID: 29956199.

I hold a dual appointment at the University of Haifa and at the University of the Medical Sciences of Havana with the latter being "Professor Invitado" by the Ministry of Health of the Republic of Cuba. Ms. Elvira Walls served as the statistical consultant on this project."

However, Dr Leisman denied any knowledge of subsequent publications of follow-up data by Dr Machado. I asked if I could see the data from the Leisman et al study, and he provided a link to a data file on ResearchGate, the details of which I have put on PubPeer.

Alas, the data were amazing, but not in a good way. The main data came from five subscales of the Aberrant Behavior Checklist (ABC)**, which can be combined into a Global score. (There were a handful of typos in the dataset for the Global score, which I have corrected in the following analysis). For the placebo group, 15 of 19 children obtained exactly the same global score on all 4 sessions. Note that there is no restriction of range for this scale: reported scores range from 9 to 154. This pattern was also seen in the five individual subscales. You might think that is to be expected if the placebo intervention is ineffective, but that's not the case. Questionnaire measures such as that used here are never totally stable. In part this is because children's behaviour fluctuates. But even if the behaviour is constant, you expect to see some variability in responses, depending on how the rater interprets the scale of measurement. Furthermore, when study participants are selected because they have extreme scores on a measure, the tendency is for scores to improve on later testing - a phenomenon known as regression to the mean, Such unchanging scores are out of line with anything I have ever come across in the intervention literature. If we turn to the treated group, we see that 20 of 21 children showed a progressive decline in global scores (i.e. improvement), with each measurement improving from the previous one over 4 sessions. This again is just not credible because we'd expect some fluctuation in children's behaviour as well as variable ratings due to error of measurement. These results were judged to be abnormal in a further commentary by Gillibert and Naudet on PubPeer. They also noted that the statistical distribution of scores was highly improbable, with far more even than odd numbers.

Although Dr Machado has been copied into my correspondence, he has not responded to queries. Remember, he was PI for the study in Cuba, and he is first author on a follow-up study from which Dr Leisman dissociated himself. Indeed, I subsequently found that there were no fewer than three follow-up reports, all appearing in a strange journal whose DOIs did not appear to be genuine: 

Machado, C., Machado, Y., Chinchilla, M., & Machado, Yazmina. (2019a). Follow-up assessment of autistic children 6 months after finishing low lever (sic) laser therapy. Internet Journal of Neurology, 21(1). https://doi.org/10.5580/IJN.54101 (available from https://ispub.com/IJN/21/1/54101).

Machado, C., Machado, Y., Chinchilla, M., & Machado, Yazmina. (2019b). Twelve months follow-up comparison between autistic children vs. Initial placebo (treated) groups. Internet Journal of Neurology, 21(2). https://doi.org/10.5580/IJN.54812 (available from https://ispub.com/IJN/21/2/54812).

Machado, C., Machado, Y., Chinchilla, M., & Machado, Yazmina. (2020). Follow-up assessment of autistic children 12 months after finishing low lever (sic) laser therapy. Internet Journal of Neurology, 21(2). https://doi.org/10.5580/IJN.54809 (available from available from https://ispub.com/IJN/21/2/54809)

The 2019a paper starts by talking of a study of anatomic and functional brain connectivity in 21 children, but then segues to an extended follow-up (6 months) of the 21 treated and 19 placebo children from the Leisman et al study. The Leisman et al study is mentioned but not adequately referenced. Remarkably, all the original participants participated in the follow-up. The same trend as before continued: the placebo group stagnated, whereas the treated group continue to improve up to 6 months later, even though they received no further active treatment after the initial 4 week period. The 2020 Abstract reported a further follow-up to 12 months. The huge group difference was sustained (see Figure 2). Three of the treated group were now reported as scoring in the normal range on a measure of clinical impairment. 

Figure 2. Chart 1 from Machado et al 2020
 

In the 2019b paper, it is reported that, after the stunning success of the initial phase of the study, the placebo group were offered the intervention, and all took part, whereupon they proceeded to make an almost identical amount of remarkable progress on all five subscales, as well as the global scale (see Figure 3). We might expect the 'baseline' scores of the cross-over group to correspond to the scores reported at the final follow-up (as placebo group prior to cross-over) but they don't. 

Figure 3: Chart 2 of Machado et al 2019b

I checked for other Erchonia studies on clinicaltrials.gov. Another study, virtually identical except for the age range, was registered in 2020 with Dr Leon Morales-Quezada of Spaulding Rehabilitation Hospital, Boston as Principal Investigator.  Comments in the documents suggest this was conducted after Erchonia failed to get the desired FDA approval. Although I have not found a published report of this second trial, I found a recruitment advertisement, which confusingly cites the NCT registration number of the 2013 study. Some summary results are included on clinicaltrials.gov, and they are strikingly different from the Leisman et al trial, with no indication of any meaningful difference between active and placebo groups in the final outcome measure, and both groups showing some improvement. I have requested fuller data from Elvira Cawthon (listed as results point of contact) with cc. to Dr Morales-Quezada and will update this post if I hear back.

It would appear that at one level this is a positive story, because it shows the regulatory system working. We do not know why FDA rejected Erchonia's request for 510k Market Clearance, but the fact that they did so might indicate that they were unimpressed by the data provided by Leisman and Machado. The fact that Machado et al reported their three follow-up studies in what appears to be an unregistered journal suggests they had difficulty persuading regular journals that the findings were legitimate. If eight 5-minute sessions with a low-level laser pointed at the head really could dramatically improve the function of children with autism 12 months later, one would imagine that Nature, Cell and Science would be scrambling to publish the articles. On the other hand, any device that has the potential to stimulate neuronal growth might also ring alarm bells in terms of potential for harm.

Use of low-level lasers to treat autism is only part of the story. Questions remain about the role of Regulatory Insight, Inc., whose statistician apparently failed to notice anything strange about the data from the first autism study. In another post, I plan to look at cases where the same organisation was involved in monitoring and analysing trials of Erchonia laser devices for other conditions such as cellulite, pain, and hearing loss.

Notes

* Quirk, B. J., & Whelan, H. T. (2011). Near-infrared irradiation photobiomodulation: The need for basic science. Photomedicine and Laser Surgery, 29(3), 143–144. https://doi.org/10.1089/pho.2011.3014. This article states "clinical uses of NIR-PBM have been studied in such diverse areas as wound healing, oral mucositis, and retinal toxicity. In addition, NIR-PBM is being considered for study in connection with areas such as aging and neural degenerative diseases (Parkinson's disease in particular). One thing that is missing in all of these pre-clinical and clinical studies is a proper investigation into the basic science of the NIR-PBM phenomenon. Although there is much discussion of the uses of NIR, there is very little on how it actually works. As far as explaining what really happens, we are basically left to resort to saying 'light enters, then a miracle happens, and good things come out!' Clearly, this is insufficient, if for no other reason than our own intellectual curiosity." 

**Aman, M. G., Singh, N. N., Stewart, A. W., & Field, C. J. (1985). The aberrant behavior checklist: A behavior rating scale for the assessment of treatment effects. American Journal of Mental Deficiency, 89(5), 485–491. N. B. this is different from the Autism Behavior Checklist which is a commonly used autism assessment. 

Sunday, 19 November 2023

Defence against the dark arts: a proposal for a new MSc course

 


Since I retired, an increasing amount of my time has been taken up with investigating scientific fraud. In recent months, I've become convinced of two things: first, fraud is a far more serious problem than most scientists recognise, and second, we cannot continue to leave the task of tackling it to volunteer sleuths. 

If you ask a typical scientist about fraud, they will usually tell you it is extremely rare, and that it would be a mistake to damage confidence in science because of the activities of a few unprincipled individuals. Asked to name fraudsters they may, depending on their age and discipline, mention Paolo Macchiarini, John Darsee, Elizabeth Holmes or Diederik Stapel, all high profile, successful individuals, who were brought down when unambiguous evidence of fraud was uncovered. Fraud has been around for years, as documented in an excellent book by Horace Judson (2004), and yet, we are reassured, science is self-correcting, and has prospered despite the activities of the occasional "bad apple". The problem with this argument is that, on the one hand, we only know about the fraudsters who get caught, and on the other hand, science is not prospering particularly well - numerous published papers produce results that fail to replicate and major discoveries are few and far between (Harris, 2017). We are swamped with scientific publications, but it is increasingly hard to distinguish the signal from the noise. In my view, it is getting to the point where in many fields it is impossible to build a cumulative science, because we lack a solid foundation of trustworthy findings. And it's getting worse and worse.

My gloomy prognosis is partly engendered by a consideration of a very different kind of fraud: the academic paper mill. In contrast to the lone fraudulent scientist who fakes data to achieve career advancement, the paper mill is an industrial-scale operation, where vast numbers of fraudulent papers are generated, and placed in peer-reviewed journals with authorship slots being sold to willing customers. This process is facilitated in some cases by publishers who encourage special issues, which are then taken over by "guest editors" who work for a paper mill. Some paper mill products are very hard to detect: they may be created from a convincing template with just a few details altered to make the article original. Others are incoherent nonsense, with spectacularly strange prose emerging when "tortured phrases" are inserted to evade plagiarism detectors.

You may wonder whether it matters if a proportion of the published literature is nonsense: surely any credible scientist will just ignore such material? Unfortunately, it's not so simple. First, it is likely that the paper mill products that are detected are just the tip of the iceberg - a clever fraudster will modify their methods to evade detection. Second, many fields of science attempt to synthesise findings using big data approaches, automatically combing the literature for studies with specific keywords and then creating databases, e.g. of genotypes and phenotypes. If these contain a large proportion of fictional findings, then attempts to use these databases to generate new knowledge will be frustrated. Similarly, in clinical areas, there is growing concern that systematic reviews that are supposed to synthesise evidence to get at the truth instead lead to confusion because a high proportion of studies are fraudulent. A third and more indirect negative consequence of the explosion in published fraud is that those who have committed fraud can rise to positions of influence and eminence on the back of their misdeeds. They may become editors, with the power to publish further fraudulent papers in return for money, and if promoted to professorships they will train a whole new generation of fraudsters, while being careful to sideline any honest young scientists who want to do things properly. I fear in some institutions this has already happened.

To date, the response of the scientific establishment has been wholly inadequate. There is little attempt to proactively check for fraud: science is still regarded as a gentlemanly pursuit where we should assume everyone has honourable intentions. Even when evidence of misconduct is strong, it can take months or years for a paper to be retracted. As whistleblower Raphaël Levy asked on his blog: Is it somebody else's problem to correct the scientific literature? There is dawning awareness that our methods for hiring and promotion might encourage misconduct, but getting institutions to change is a very slow business, not least because those in positions of power succeeded in the current system, and so think it must be optimal.

The task of unmasking fraud is largely left to hobbyists and volunteers, a self-styled army of "data sleuths", who are mostly motivated by anger at seeing science corrupted and the bad guys getting away with it. They have developed expertise in spotting certain kinds of fraud, such as image manipulation and improbable patterns in data, and they have also uncovered webs of bad actors who have infiltrated many corners of science. One might imagine that the scientific establishment would be grateful that someone is doing this work, but the usual response to a sleuth who finds evidence of malpractice is to ignore them, brush the evidence under the carpet, or accuse them of vexatious behaviour. Publishers and academic institutions are both at fault in this regard.

If I'm right, this relaxed attitude to the fraud epidemic is a disaster-in-waiting. There are a number of things that need to be done urgently. One is to change research culture so that rewards go to those whose work is characterised by openness and integrity, rather than those who get large grants and flashy publications. Another is for publishers to act far more promptly to investigate complaints of malpractice and issue retractions where appropriate. Both of these things are beginning to happen, slowly. But there is a third measure that I think should be taken as soon as possible, and that is to train a generation of researchers in fraud busting. We owe a huge debt of gratitude to the data sleuths, but the scale of the problem is such that we need the equivalent of a police force rather than a volunteer band. Here are some of the topics that an MSc course could cover:

  • How to spot dodgy datasets
  • How to spot manipulated figures
  • Textual characteristics of fraudulent articles
  • Checking scientific credentials
  • Checking publisher credentials/identifying predatory publishers
  • How to raise a complaint when fraud is suspected
  • How to protect yourself from legal attacks
  • Cognitive processes that lead individuals to commit fraud
  • Institutional practices that create perverse incentives
  • The other side of the coin: "Merchants of doubt" whose goal is to discredit science

I'm sure there's much more that could be added and would be glad of suggestions. 

Now, of course, the question is what could you do with such a qualification. If my predictions are right, then individuals with such expertise will increasingly be in demand in academic institutions and publishing houses, to help ensure the integrity of work they produce and publish. I also hope that there will be growing recognition of the need for more formal structures to be set up to investigate scientific fraud and take action when it is discovered: graduates of such a course would be exactly the kind of employees needed in such an organisation.

It might be argued that this is a hopeless endeavour. In Harry Potter and the Half-Blood Prince (Rowling, 2005) Professor Snape tells his pupils:

 "The Dark Arts, are many, varied, ever-changing, and eternal. Fighting them is like fighting a many-headed monster, which, each time a neck is severed, sprouts a head even fiercer and cleverer than before. You are fighting that which is unfixed, mutating, indestructible."

This is a pretty accurate description of what is involved in tackling scientific fraud. But Snape does not therefore conclude that action is pointless. On the contrary, he says: 

"Your defences must therefore be as flexible and inventive as the arts you seek to undo."

I would argue that any university that wants to be ahead of the field in this enterprise could should flexibility and inventiveness in starting up a postgraduate course to train the next generation of fraud-busting wizards. 

Bibliography

Bishop, D. V. M. (2023). Red flags for papermills need to go beyond the level of individual articles: A case study of Hindawi special issues. https://osf.io/preprints/psyarxiv/6mbgv
Boughton, S. L., Wilkinson, J., & Bero, L. (2021). When beauty is but skin deep: Dealing with problematic studies in systematic reviews | Cochrane Library. Cochrane Database of Systematic Reviews, 5. Retrieved 4 June 2021, from https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.ED000152/full
 Byrne, J. A., & Christopher, J. (2020). Digital magic, or the dark arts of the 21st century—How can journals and peer reviewers detect manuscripts and publications from paper mills? FEBS Letters, 594(4), 583–589. https://doi.org/10.1002/1873-3468.13747
Cabanac, G., Labbé, C., & Magazinov, A. (2021). Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals (arXiv:2107.06751). arXiv. https://doi.org/10.48550/arXiv.2107.06751
Carreyrou, J. (2019). Bad Blood: Secrets and Lies in a Silicon Valley Startup. Pan Macmillan.
COPE & STM. (2022). Paper mills: Research report from COPE & STM. Committee on Publication Ethics and STM. https://doi.org/10.24318/jtbG8IHL 
Culliton, B. J. (1983). Coping with fraud: The Darsee Case. Science (New York, N.Y.), 220(4592), 31–35. https://doi.org/10.1126/science.6828878 
Grey, S., & Bolland, M. (2022, August 18). Guest Post—Who Cares About Publication Integrity? The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2022/08/18/guest-post-who-cares-about-publication-integrity/ 
Hanson, M., Gómez Barreiro, P., Crosetto, P., & Brockington, D. (2023). The strain on scientific publishing (2309; p. 33343265 Bytes). arXiv. https://arxiv.org/ftp/arxiv/papers/2309/2309.15884.pdf 
Harris, R. (2017). Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions (1st edition). Basic Books.

Judson, H. F. (2004). The Great Betrayal. Orlando.

Lévy, R. (2022, December 15). Is it somebody else’s problem to correct the scientific literature? Rapha-z-Lab. https://raphazlab.wordpress.com/2022/12/15/is-it-somebody-elses-problem-to-correct-the-scientific-literature/
 Moher, D., Bouter, L., Kleinert, S., Glasziou, P., Sham, M. H., Barbour, V., Coriat, A.-M., Foeger, N., & Dirnagl, U. (2020). The Hong Kong Principles for assessing researchers: Fostering research integrity. PLOS Biology, 18(7), e3000737. https://doi.org/10.1371/journal.pbio.3000737
 Oreskes, N., & Conway, E. M. (2010). Merchants of Doubt: How a handful of scientists obscured the truth on issues from tobacco smoke to global warming. Bloomsbury Press.
 Paterlini, M. (2023). Paolo Macchiarini: Disgraced surgeon is sentenced to 30 months in prison. BMJ, 381, p1442. https://doi.org/10.1136/bmj.p1442  
Rowling, J. K. (2005) Harry Potter and the Half-Blood Prince. Bloomsbury, London. ‎ ISBN: 9780747581086
Smith, R. (2021, July 5). Time to assume that health research is fraudulent until proven otherwise? The BMJ. https://blogs.bmj.com/bmj/2021/07/05/time-to-assume-that-health-research-is-fraudulent-until-proved-otherwise/
Stapel, D. (2016). Faking science: A true story of academic fraud.  Translated by Nicholas J. Brown. http:// nick.brown.free.fr/stapel.
Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific misconduct and the myth of self-correction in science. Perspectives on Psychological Science, 7(6), 670–688. https://doi.org/10.1177/1745691612460687
 

Note: On-topic comments are welcome but are moderated to avoid spam, so there may be a delay before they appear.

Thursday, 12 October 2023

When privacy rules protect fraudsters

 

 
I was recently contacted with what I thought was a simple request: could I check the Oxford University Gazette to confirm that a person, X, had undergone an oral examination (viva) for a doctorate a few years ago. The request came indirectly from a third party, Y, via a colleague who knew that on the one hand I was interested in scientific fraud, and on the other hand, that I was based at Oxford.

My first thought was that this was a rather cumbersome way of checking someone's credentials. For a start, as Y had discovered, you can consult the on-line University Gazette only if you have an official affiliation with the university. In theory, when someone has a viva, the internal examiner notifies the University Gazette, which announces details in advance so that members of the university can attend if they so wish. In practice, it is vanishingly rare for an audience to turn up, and the formal notification to the Gazette may get overlooked.

But why, I wondered, didn't Y just check the official records of Oxford University listing names and dates of degrees? Well, to my surprise, it turned out that you can't do that. The university website is clear that to verify someone's qualifications you need to meet two conditions. First, the request can only be made by "employers, prospective employers, other educational institutions, funding bodies or recognised voluntary organisations". Second, "the student's permission ... should be acquired prior to making any verification request".

Anyhow, I found evidence online that X had been a graduate student at the university, but when I checked the Gazette I could find no mention of X having had an oral examination. The other source of evidence would be the University Library where there should be a copy of the thesis for all higher degrees. I couldn't find it in the catalogue. I suggested that Y might check further but they were already ahead of me, and had confirmed with the librarian that no thesis had been deposited in that name.

Now, I have no idea whether X is fraudulently claiming to have an Oxford doctorate, but I'm concerned that it is so hard for a private individual to validate someone's credentials. As far as I can tell, the justification comes from data protection regulations, which control what information organisations can hold about individuals. This is not an Oxford-specific interpretation of rules - I checked a few other UK universities, and the same processes apply.

Having said that, Y pointed out to me that there is a precedent for Oxford University to provide information when there is media interest in a high-profile case: in response to a freedom of information request, they confirmed that Ferdinand Marcus Jr did not have the degree he was claiming.

There will always be tension between openness and the individual's right to privacy, but the way the rules are interpreted mean that anyone could claim they had a degree from a UK university and it would be impossible to check this. Is there a solution? I'm no lawyer, but I would have thought it should be trivial to require that on receipt of a degree, the student is asked to give signed permission for their name, degree and date of degree to be recorded on a publicly searchable database. I can't see a downside to this, and going forward it would save a lot of administrative time dealing with verification requests.

Something like this does seem to work outside Europe. I only did a couple of spot checks, but found this for York University (Ontario):

"It is the University's policy to make information about the degrees or credentials conferred by the University and the dates of conferral routinely available. In order to protect our alumni information as much as possible, YU Verify will give users a result only if the search criteria entered matches a unique record. The service will not display a list of names which may match criteria and allow you to select."

And for Macquarie University, Australia, there is exactly the kind of searchable website that I'd assumed Oxford would have.

I'd be interested if anyone can think of unintended bad consequences of this approach. I had a bit of to-and-fro on Twitter about this with someone who argued that it was best to keep as much information as possible out of the public domain. I remain unconvinced: academic qualifications are important for providing someone with credentials as an expert, and if we make it easy for anyone to pretend to have a degree from a prestigious institution, I think the potential for harm is far greater than any harms caused by lack of privacy. Or have I missed something? 

 N.B. Comments on the blog are moderated so may only appear after a delay.


P.S. Some thoughts via Mastodon from Martin Vueilleme on potential drawback of directory: 

Far fetched, but I could see the following reasons:

- You live in an oppressive country that targets academics, intellectuals
- Hiding your university helps prevent stalkers (or other predators) from getting further information on you
- Hiding your university background to fit in a group
- Your thesis is on a sensitive topic or a topic forbidden from being studied where you live
- Hiding your university degree because you were technically not allowed to get it (eg women)

My (DB) response is that I think that in terms of balancing probabilities of risks against the risk of fraudsters benefiting from lack of checking, the case for the open directory is strengthened, as these risks seem very slight for UK universities (at least for now!). And the other cost/benefit analysis is of finances, where an open directory would seem superior; i.e. it costs to maintain the directory, but that has to be done anyhow, Currently there are extra costs for people who are employed to respond to requests for validation.

Monday, 2 October 2023

Spitting out the AI Gobbledegook sandwich: a suggestion for publishers

 


The past couple of years have been momentous for some academic publishers. As documented in a preprint this week, after rapid growth, largely via "special issues" of journals, they have dramatically increased the number of published articles, and at the same time made enormous profits. A recent guest post by Huanzi Zhang, however, showed this has not been without problems. Unscrupulous operators of so-called "papermills" saw an opportunity to boost their own profits by selling authorship slots and then placing fraudulent articles in special issues that were controlled by complicit editors. Gradually, publishers realised they had a problem and started to retract fraudulent articles. To date, Hindawi has retracted over 5000 articles since 2021*.  As described in Huanzi's blogpost, this has made shareholders nervous and dented the profits of parent company Wiley. 

 

There are numerous papermills, and we only know about the less competent ones whose dodgy articles are relatively easy to detect. For a deep dive into papermills in Hindawi journals see this blogpost by the anonymous sleuth Parashorea tomentella.  At least one papermill is the source of a series of articles that follow a template that I have termed the "AI gobbledegook sandwich".  See for instance my comments here on an article that has yet to be retracted. For further examples, search the website PubPeer with the search term "gobbledegook sandwich". 

 

After studying a number of these articles, my impression is that they are created as follows. You start with a genuine article. Most of these look like student projects. The topics are various, but in general they are weak on scientific content. They may be a review of an area, or if data is gathered, it is likely to be some kind of simple survey.  In some cases, reference is made to a public dataset. To create a paper for submission, the following steps are taken:

 

·      The title is changed to include terms that relate to the topic of a special issue, such as "Internet of Things" or "Big data".

·      Phrases are scattered in the Abstract and Introduction mentioning these terms.

·      A technical section is embedded in the middle of the original piece describing the method to be used.  Typically this is full of technical equations. I suspect these are usually correct, in that they use standard formulae from areas such as machine learning, and in some cases can be traced to Wikipedia or another source.  It is not uncommon to see very basic definitions, e.g. formulae for sensitivity and specificity of prediction.

·      A results section is created showing figures that purport to demonstrate how the AI method has been applied to the data. This often reveals that the paper is problematic, as plots are at best unclear and at worst bear no relationship to anything that has gone before.  Labels for figures and axes tend to be vague. A typical claim is that the prediction from the AI model is better than results from other, competing models. It is usually hard to work out what is being predicted from what.

·      The original essay resumes for a Conclusions section, but with a sentence added to say how AI methods have been useful in improving our understanding.

·      An optional additional step is to sprinkle irrelevant citations in the text: we know that papermills collect further income by selling citations, and new papers can act as vehicles for these.


Papermills have got away with this, because the content of these articles is sufficiently technical and complex that the fraud may only be detectable on close reading. Where I am confident there is fraud, I will use the term "Gobbledegook sandwich" in my report on PubPeer, but there are many, many papers where my suspicions are raised but it would take more time than it is worth for me to comb through the article to find compelling evidence.

 

For a papermill, the beauty of the AI gobbledegook sandwich is that you can apply AI methods to almost any topic, and there are so many different algorithms that can be used that there is a potentially infinite number of papers that can be written according to this template.  The ones I have documented include topics ranging from educational methods, hotel management, sports, art, archaeology, Chinese medicine, music, building design, mental health and promotion of Marxist ideology. In none of these papers did the application of AI methods make any sense, and they would not get past a competent editor or reviewers, but once a complicit editor is planted in a journal, they can accept numerous articles. 

 

Recently, Hindawi has ramped up its integrity operations and is employing many more staff to try and shut this particular stable door.  But Hindawi is surely not the only publisher infected by this kind of fraud, and we need a solution that can be used by all journals. My simple suggestion is to focus on prevention rather than cure, by requiring that all articles that report work using AI/ML methods adopt reporting standards that are being developed for machine-learning based science, as described on this website.  This requires computational reproducibility, i.e., data and scripts must be provided so that all results can be reproduced.  This would be a logical impossibility for AI gobbledegook sandwiches.

 

Open science practices were developed with the aim of improving reproducibility and credibility of science, but, as I've argued elsewhere, they could be highly effective in preventing fraud.  Mandating reporting standards could be an important step, which, if accompanied also by open peer review, will make life of the papermillers much harder.



*Source is spreadsheet maintained by the anonymous sleuth Parashorea tomentella

 

N.B. Comments on this blog are moderated, so there may be a delay before they appear. 






Monday, 4 September 2023

Polyunsaturated fatty acids and children's cognition: p-hacking and the canonisation of false facts

One of my favourite articles is a piece by Nissen et al (2016) called "Publication bias and the canonization of false facts". In it, the authors model how false information can masquerade as overwhelming evidence, if, over cycles of experimentation, positive results are more likely to be published than null ones. But their article is not just about publication bias: they go on to show how p-hacking magnifies this effect, because it leads to a false positive rate that is much higher than the nominal rate (typically .05).

I was reminded of this when looking at some literature on polyunsaturated fatty acids and children's cognition. This was a topic I'd had a passing interest in years ago when fish oil was being promoted for children with dyslexia and ADHD. I reviewed the literature back in 2008 for a talk at the British Dyslexia Association (slides here). What was striking then was that, whilst there were studies claiming positive effects of dietary supplements, they all obtained different findings. It looked suspicious to me, as if authors would keep looking in their data, and divide it up every way possible, in order to find something positive to report – in other words, p-hacking seemed rife in this field.

My interest in this area was piqued more recently simply because I was looking at articles that had been flagged up because they contained "tortured phrases". These are verbal expressions that seem to have been selected to avoid plagiarism detectors: they are often unintentionally humorous, because attempts to generate synonyms misfire. For instance, in this article by Khalid et al, published in Taylor and Francis' International Journal of Food Properties we are told: 

"Parkinson’s infection is a typical neurodegenerative sickness. The mix of hereditary and natural variables might be significant in delivering unusual protein inside explicit neuronal gatherings, prompting cell brokenness and later demise" 

And, regarding autism: 

"Chemical imbalance range problem is a term used to portray various beginning stage social correspondence issues and tedious sensorimotor practices identified with a solid hereditary part and different reasons."

The paper was interesting, though, for another reason. It contained a table summarising results from ten randomized controlled trials of polyunsaturated fatty acid supplementation in pregnant women and young children. This was not a systematic review, and it was unclear how the studies had been selected. As I documented on PubPeer,  there were errors in the descriptions of some of the studies, and the interpretation was superficial. But as I checked over the studies, I was also struck by the fact that all studies concluded with a claim of a positive finding, even when the planned analyses gave null results. But, as with the studies I'd looked at in 2008, no two studies found the same thing. All the indicators were that this field is characterised by a mixture of p-hacking and hype, which creates the impression that the benefits of dietary supplementation are well-established, when a more dispassionate look at the evidence suggests considerable scepticism is warranted.

There were three questionable research practices that were prominent. First, testing a large number of 'primary research outcomes' without any correction for multiple comparisons. Three of the papers cited by Khalid did this, and they are marked in Table 1 below with "hmm" in the main analysis column. Two of them argued against using a method such as Bonferroni correction:

"Owing to the exploratory nature of this study, we did not wish to exclude any important relationships by using stringent correction factors for multiple analyses, and we recognised the potential for a type 1 error." (Dunstan et al, 2008)

"Although multiple comparisons are inevitable in studies of this nature, the statistical corrections that are often employed to address this (e.g. Bonferroni correction) infer that multiple relationships (even if consistent and significant) detract from each other, and deal with this by adjustments that abolish any findings without extremely significant levels (P values). However, it has been validly argued that where there are consistent, repeated, coherent and biologically plausible patterns, the results ‘reinforce’ rather than detract from each other (even if P values are significant but not very large)" (Meldrum et al, 2012)
While it is correct that Bonferroni correction is overconservative with correlated outcome measures, there are other methods for protecting the analysis from inflated type I error that should be applied in such cases (Bishop, 2023).

The second practice is conducting subgroup analyses: the initial analysis finds nothing, so a way is found to divide up the sample to find a subgroup that does show the effect. There is a nice paper by Peto that explains the dangers of doing this. The third practice, looking for correlations between variables rather than main effects of intervention: with sufficient variables, it is always possible to find something 'significant' if you don't employ any correction for multiple comparisons. This inflation of false positives by correlational analysis is a well-recognised problem in the field of neuroscience (e.g. Vul et al., 2008).

Given that such practices were normative in my own field of psychology for many years, I suspect that those who adopt them here are unaware of how serious a risk they run of finding spurious positive results. For instance, if you compare two groups on ten unrelated outcome measures, then the probability that something will give you a 'significant' p-value below .05 is not 5% but 40%. (The probability that none of the 10 results is significant is .95^10, which is .6. So the probability that at least one is below .05 is 1-.6 = .4). Dividing a sample into subgroups in the hope of finding something 'significant' is another way to multiply the rate of false positive findings. 

In many fields, p-hacking is virtually impossible to detect because authors will selectively report their 'significant' findings, so the true false positive rate can't be estimated. In randomised controlled trials, the situation is a bit better, provided the study has been registered on a trial registry – this is now standard practice, precisely because it's recognised as an important way to avoid, or at least increase detection of, analytic flexibility and outcome switching. Accordingly, I catalogued, for the 10 studies reviewed by Khalid et al, how many found a significant effect of intervention on their planned, primary outcome measure, and how many focused on other results. The results are depressing. Flexible analyses are universal. Some authors emphasised the provisional nature of findings from exploratory analyses, but many did not. And my suspicion is that, even if the authors add a word of caution, those citing the work will ignore it.  


Table 1: Reporting outcomes for 10 studies cited by Khalid et al (2022)

Khalid # Register N Main result* Subgrp Correlatn Abs -ve Abs +ve
41 yes 86 NS yes no no yes
42 no 72 hmm no no no yes
43 no 420 hmm no no yes yes
44 yes 90 NS no yes yes yes
45 no 90 yes no yes NA
yes
46 yes 150 hmm no no yes yes
47 yes 175 NS no yes yes yes
48 no 107 NS yes no yes yes
49 yes 1094 NS yes no yes yes
50 no 27 yes no no yes yes

Key: Main result coded as NS (nonsignificant), yes (significant) or hmm (not significant if Bonferroni corrected); Subgrp and Correlatn coded yes or no depending on whether post hoc subgroup or correlational analyses conducted. Abs -ve coded yes if negative results reported in abstract, no if not, and NA if no negative results obtained. Abs +ve coded yes if positive results mentioned in abstract.

I don't know if the Khalid et al review will have any effect – it is so evidently flawed that I hope it will be retracted. But the problems it reveals are not just a feature of the odd rogue review: there is a systemic problem with this area of science, whereby the desire to find positive results, coupled with questionable research practices and publication bias, have led to the construction of a huge edifice of evidence based on extremely shaky foundations. The resulting waste in researcher time and funding that comes from pursuing phantom findings is a scandal that can only be addressed by researchers prioritising rigour, honesty and scholarship over fast and flashy science.