Friday 9 February 2024

The world of Poor Things at MDPI journals

At the weekend, the Observer ran a piece by Robin McKie entitled "‘The situation has become appalling’: fake scientific papers push research credibility to crisis point". I was one of those interviewed for the article, describing my concerns about a flood of dodgy papers that was polluting the scientific literature.

Two days later I received an email from the editorial office of MDPI publishers with the header "[Children] (IF: 2.4, ISSN 2227-9067): Good Paper Sharing on the Topic of" (sic) that began:

Greetings from the Children Editorial Office!

We recently collected 10 highly cited papers in our journal related to Childhood Autism. And we sincerely invite you to visit and read these papers, because you are an excellent expert in this field of study.

Who could resist such a flattering invitation? MDPI is one of those publishers that appears to be encouraging publication of low quality work, with a massive growth in special issues where papers are published with remarkably rapid turnaround times. Only last week it was revealed that the journal is affected by fake peer review that appears to be generated by AI. So I was curious to take a look.

The first article, by Frolli et al (2022a) was weird. It reported a comparison of two types of intervention designed to improve emotion recognition in children with autism, one of which used virtual reality. The first red flag was the sample size: two groups each of 30 children, all originally from the city of Caserta. I checked Wikipedia, which told me the population of Caserta was around 76,000 in 2017. Recruiting participants for intervention studies is typically slow and laborious and this is a remarkable sample size to recruit from such a small region. But credibility is then stretched to breaking point on hearing that the selection criteria required that the children were all aged between 9 and 10 years and had IQs of 97 or above. No researcher in their right mind would impose unnecessary constraints on recruitment, and both the age and IQ criteria are far tighter than would usually be adopted. I wondered whether there might be a typo in this account, but we then hear that the IQ range of the sample is indeed remarkably narrow: 

"The first experimental group (Gr1) was composed of 30 individuals with a mean age of 9.3 (SD 0.63) and a mean IQ of 103.00 (SD 1.70). ...... The second experimental group (Gr2) was composed of 30 individuals with a mean age of 9.4 (SD 0.49) and mean IQ of 103.13 (SD 2.04)...."

Most samples for studies using Wechsler IQ scales have SD of at least 8, even if cutoffs are applied as selection criteria, so this is unbelievably low.

This dubious paper prompted me to look at others by the first author. It was rather like pulling a thread on a hole in a sweater - things started to unravel fast. A paper published by Frolli et al (2023a) in the MDPI journal Behavioral Sciences claimed to have studied eighty 18-year-olds recruited from four different high schools. The selection criteria were again unbelievably stringent: IQ assessed on the WAIS-IV fell between 95-105 "to ensure that participants fell within the average range of intellectual functioning, minimizing the impact of extreme cognitive variations on our analyses". The lower IQ range selected here corresponds to z-score of -0.33 or 37th percentile. If the population of students covered the full range of IQ, then only around 25% would meet the criterion (between 37th and 63rd centile), so to obtain a sample of 80 it would be necessary to test over 300 potential participants. Furthermore, there are IQ screening tests that can be used in this circumstance that are relatively quick to administer, but the WAIS-IV is not one of them. We are told all participants were given the full test, which requires individual administration by a qualified psychologist and takes around one hour to complete. So who did all this testing, and where? The article states: "The data were collected and analyzed at the FINDS Neuropsychiatry Outpatient Clinic by licensed psychologists in collaboration with the University of International Studies of Rome (UNINT)." So we are supposed to believe that hundreds of 18-year-olds trekked to a neuropsychiatry outpatient clinic for a full IQ screening which most of them would not have passed. I cannot imagine a less efficient way of conducting such a study. I could not find any mention of compensation for participants, which is perhaps unsurprising as the research received no external funding. All of this is described as happening remarkably fast, with ethics approval in January 2023, and submission of the article in October 2023.

Another paper in Children in 2023 focused on ADHD, and again reported recruiting two groups of 30 children for an intervention that lasted 5 months (Frolli et al., 2023b). The narrow IQ selection criteria were again used, with WISC-IV IQs in the range 95-105, and the mean IQs were 96.48 (SD =1.09) and 98.44 (SD = 1.12) for groups 1 and 2 respectively. Again, the research received no external funding. The report of ethics approval is scanty "The study was conducted in accordance with the Declaration of Helsinki. The study was approved by the Ethics Committee and the Academic Senate of the University of International Studies of Rome."

The same first author published a paper on the impact of COVID-19 on cognitive development and executive functioning in adolescents in 2021 (Frolli et al, 2021). I have not gone over it in detail, but a quick scan revealed some very odd statistical reporting. There were numerous F-ratios, but they were all negative, which is impossible, as F is a ratio between two positive numbers. Furthermore, the reported p-values and degrees of freedom didn't always correspond to the F-ratio, even if the sign was ignored.

At this point I was running out of steam, but a quick look at Frolli et al (2022a) on Executive Functions and Foreign Language Learning suggested yet more problems, with the sentence "Significance at the level of 5% (α < 0.001) has been accepted" featuring at least twice. It is hard to believe that a human being wrote this sentence, or that any human author, editor or reviewer read it without comment.

If anyone is interested in pulling at other related threads, I suspect it would be of interest to look at articles accepted for a Special Issue of the MDPI journal Disabilities co-edited by Frolli.

In his brilliant film Poor Things, Yorgos Lanthimos distorts familiar objects and places just enough to be disturbing. Lisbon looks like what I imagine Lisbon would be in the Victorian age, except that the colours are unusually vivid, there are strange flying cars in the sky, and nobody seems concerned at the central character wandering around only partially clothed (see, e.g., this review).  The combined impression is that MDPI publishes papers from that universe, where everything looks superficially like genuine science but with jarring features that tell you something is amiss. The difference is that Poor Things has a happy ending.


Frolli, A.; Ricci, M.C.; Di Carmine, F.; Lombardi, A.; Bosco, A.; Saviano, E.; Franzese, L. The Impact of COVID-19 on Cognitive Development and Executive Functioning in Adolescents: A First Exploratory Investigation. Brain Sci. 2021, 11, 1222.

Frolli, A.; Savarese, G.; Di Carmine, F.; Bosco, A.; Saviano, E.; Rega, A.; Carotenuto, M.; Ricci, M.C. Children on the Autism Spectrum and the Use of Virtual Reality for Supporting Social Skills. Children 2022a, 9, 181.

Frolli, A.; Cerciello, F.; Esposito, C.; Ciotola, S.; De Candia, G.; Ricci, M.C.; Russo, M.G. Executive Functions and Foreign Language Learning. Pediatr. Rep. 2022b, 14, 450-456.

Frolli, A.; Cerciello, F.; Ciotola, S.; Ricci, M.C.; Esposito, C.; Sica, L.S. Narrative Approach and Mentalization. Behav. Sci. 2023a, 13, 994.

Frolli, A.; Cerciello, F.; Esposito, C.; Ricci, M.C.; Laccone, R.P.; Bisogni, F. Universal Design for Learning for Children with ADHD. Children 2023b, 10, 1350.


  1. Thinking about this major problem of poor quality papers, would it be possible for someone to make a 'Journal Integrity Score' under which a random sample of papers in each journal get checked for sense and statistics and journals with a low score are blacklisted. Having a metric feels like it might be the only way to get journals to up their game and to reward the good journals that do things properly.

  2. There are estimated to be (IIRC) about 20,000 academic journals. We might want to read and evaluating a sample of 30 papers from each of those to get an idea how good they are. I generally take about two hours per paper, so that's 1.2 million person-hours of people with PhDs. $100 million should cover it.

  3. Open peer review could clear many of these drawbacks

  4. Thank you for looking into these paper mills! It is very much needed!