Showing posts with label reproducibility. Show all posts
Showing posts with label reproducibility. Show all posts

Tuesday, 26 August 2025

Gold standard science isn't gold standard if it's applied selectively. Part 1: Firearms injuries

In May, the White House produced a report called "Restoring Gold Standard Science", which was followed by an NIH plan to implement this policy. The initial report pointed to the well-attested problems with reproducibility in science, and to high-profile cases of research fraud, and recommended nine requirements for Gold Standard science. It must be: 
(i) reproducible;
(ii) transparent; 
(iii) communicative of error and uncertainty; 
(iv) collaborative and interdisciplinary; 
(v) skeptical of its findings and assumptions; 
(vi) structured for falsifiability of hypotheses; 
(vii) subject to unbiased peer review; 
(viii) accepting of negative results as positive outcomes; and 
(ix) without conflicts of interest. 
This is a clever move by the Trump administration because it's hard to make a coherent case against such a policy without appearing to question whether science should be done to high standards. Nevertheless, many scientists are concerned. The main issue is the mismatch between what the government is doing, in terms of defunding science, stopping grants, firing competent people and appointing incompetent cronies in their place, and the lofty ambitions stated in the plan (see, e.g., this blogpost). This leads to suspicion of the motives of those behind Gold Standard Science, which is seen as attempt to weaponise science policy in order to attack science that it doesn't like. 
This is credible given recent history. Take the requirement for transparency. Back in 2016, Stephan Lewandowsky and I wrote an opinion piece for Nature entitled Don't Let Transparency Damage Science, noting that politicians who didn't like climate science or tobacco research were tying researchers up in red tape with spurious demands for data. In 2018, I blogged about a proposal for Strengthening Transparency in Regulatory Science by the US Environmental Protection Agency (EPA) that stated that policy should only be based on research that has openly available public data. This would allow government to dismiss regulations concerning substances such as asbestos or pesticides, where data was gathered long before open data was a thing. Similarly, if we were to argue that a result must be shown to be reproducible before it can influence policy, then politicians can justify ignoring inconvenient findings from studies that are not easy to reproduce, such as those involving long time-scales or complex methods. 
Doing science well is much harder than doing it badly; it takes time and expertise to pre-register a study, to work out the best protocol, to design an analysis to reduce bias, and to make data and scripts open and useable. I'm strongly in favour of all of those things, but, like many others, I am suspicious that demands for adherence to the highest standards may be used selectively to impede or even terminate research that the administration doesn't like. 
I was accordingly interested to see how Gold Standard Science was referenced in plans for government-funded research in this mammoth report from the US Senate Committee on Appropriations, which was posted on July 31st 2025, a couple of months after the Gold Standard Science document was written. Pages 105-172 cover the National Institutes for Health, and discuss funding for numerous health conditions. I could find just one paragraph where the importance of open data or pre-registration was mentioned, and it was this one: 
Firearm Injury and Mortality Prevention.—The Committee provides $12,500,000 to conduct research on firearm injury and mortality prevention. Given violence and suicide have a number of causes, the Committee recommends NIH take a comprehensive approach to studying these underlying causes and evidence-based methods of prevention of injury, including crime prevention. All grantees under this section will be required to fulfill requirements around open data, open code, pre-registration of research projects, and open access to research articles consistent with the National Science Foundation’s open science principles. The Director is to report to the Committees within 30 days of enactment of this act on implementation schedules and procedures for grant awards, which strive to ensure that such awards support ideologically and politically unbiased research projects. 
I found this odd. Surely, if the Bhattacharya plan is to be believed, the statements about compliance with open science principles should apply to all the research done by NIH? Yet a search of the document finds that only this paragraph (repeated in two sections) makes any mention of such practice, and this happens to concern a topic, firearms injuries, that is a contentious political issue. 
So I'm all in favour of Gold Standard Science as described in the Bhattacharya plan, but let's see these principles be applied even-handedly and not just to research that might give uncomfortable results.

Saturday, 19 October 2024

Bishopblog catalogue (updated 19 October 2024)

Source: http://www.weblogcartoons.com/2008/11/23/ideas/

Those of you who follow this blog may have noticed a lack of thematic coherence. I write about whatever is exercising my mind at the time, which can range from technical aspects of statistics to the design of bathroom taps. I decided it might be helpful to introduce a bit of order into this chaotic melange, so here is a catalogue of posts by topic.

Language impairment, dyslexia and related disorders
The common childhood disorders that have been left out in the cold (1 Dec 2010) What's in a name? (18 Dec 2010) Neuroprognosis in dyslexia (22 Dec 2010) Where commercial and clinical interests collide: Auditory processing disorder (6 Mar 2011) Auditory processing disorder (30 Mar 2011) Special educational needs: will they be met by the Green paper proposals? (9 Apr 2011) Is poor parenting really to blame for children's school problems? (3 Jun 2011) Early intervention: what's not to like? (1 Sep 2011) Lies, damned lies and spin (15 Oct 2011) A message to the world (31 Oct 2011) Vitamins, genes and language (13 Nov 2011) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Phonics screening: sense and sensibility (3 Apr 2012) What Chomsky doesn't get about child language (3 Sept 2012) Data from the phonics screen (1 Oct 2012) Auditory processing disorder: schisms and skirmishes (27 Oct 2012) High-impact journals (Action video games and dyslexia: critique) (10 Mar 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) Raising awareness of language learning impairments (26 Sep 2013) Good and bad news on the phonics screen (5 Oct 2013) What is educational neuroscience? (25 Jan 2014) Parent talk and child language (17 Feb 2014) My thoughts on the dyslexia debate (20 Mar 2014) Labels for unexplained language difficulties in children (23 Aug 2014) International reading comparisons: Is England really do so poorly? (14 Sep 2014) Our early assessments of schoolchildren are misleading and damaging (4 May 2015) Opportunity cost: a new red flag for evaluating interventions (30 Aug 2015) The STEP Physical Literacy programme: have we been here before? (2 Jul 2017) Prisons, developmental language disorder, and base rates (3 Nov 2017) Reproducibility and phonics: necessary but not sufficient (27 Nov 2017) Developmental language disorder: the need for a clinically relevant definition (9 Jun 2018) Changing terminology for children's language disorders (23 Feb 2020) Developmental Language Disorder (DLD) in relaton to DSM5 (29 Feb 2020) Why I am not engaging with the Reading Wars (30 Jan 2022)

Autism
Autism diagnosis in cultural context (16 May 2011) Are our ‘gold standard’ autism diagnostic instruments fit for purpose? (30 May 2011) How common is autism? (7 Jun 2011) Autism and hypersystematising parents (21 Jun 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) The ‘autism epidemic’ and diagnostic substitution (4 Jun 2012) How wishful thinking is damaging Peta's cause (9 June 2014) NeuroPointDX's blood test for Autism Spectrum Disorder ( 12 Jan 2019) Biomarkers to screen for autism (again) (6 Dec 2022)

Developmental disorders/paediatrics
The hidden cost of neglected tropical diseases (25 Nov 2010) The National Children's Study: a view from across the pond (25 Jun 2011) The kids are all right in daycare (14 Sep 2011) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Changing the landscape of psychiatric research (11 May 2014) The sinister side of French psychoanalysis revealed (15 Oct 2019) A desire for clickbait can hinder an academic journal's reputation (4 Oct 2022) Polyunsaturated fatty acids and children's cognition: p-hacking and the canonisation of false facts (4 Sep 2023)

Genetics
Where does the myth of a gene for things like intelligence come from? (9 Sep 2010) Genes for optimism, dyslexia and obesity and other mythical beasts (10 Sep 2010) The X and Y of sex differences (11 May 2011) Review of How Genes Influence Behaviour (5 Jun 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Genes, brains and lateralisation (22 Dec 2012) Genetic variation and neuroimaging (11 Jan 2013) Have we become slower and dumber? (15 May 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) Incomprehensibility of much neurogenetics research ( 1 Oct 2016) A common misunderstanding of natural selection (8 Jan 2017) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Pre-registration or replication: the need for new standards in neurogenetic studies (1 Oct 2017) Review of 'Innate' by Kevin Mitchell ( 15 Apr 2019) Why eugenics is wrong (18 Feb 2020)

Neuroscience
Neuroprognosis in dyslexia (22 Dec 2010) Brain scans show that… (11 Jun 2011)  Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Neuronal migration in language learning impairments (2 May 2012) Sharing of MRI datasets (6 May 2012) Genetic variation and neuroimaging (1 Jan 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) What is educational neuroscience? ( 25 Jan 2014) Changing the landscape of psychiatric research (11 May 2014) Incomprehensibility of much neurogenetics research ( 1 Oct 2016)

Reproducibility
Accentuate the negative (26 Oct 2011) Novelty, interest and replicability (19 Jan 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013) Who's afraid of open data? (15 Nov 2015) Blogging as post-publication peer review (21 Mar 2013) Research fraud: More scrutiny by administrators is not the answer (17 Jun 2013) Pressures against cumulative research (9 Jan 2014) Why does so much research go unpublished? (12 Jan 2014) Replication and reputation: Whose career matters? (29 Aug 2014) Open code: note just data and publications (6 Dec 2015) Why researchers need to understand poker ( 26 Jan 2016) Reproducibility crisis in psychology ( 5 Mar 2016) Further benefit of registered reports ( 22 Mar 2016) Would paying by results improve reproducibility? ( 7 May 2016) Serendipitous findings in psychology ( 29 May 2016) Thoughts on the Statcheck project ( 3 Sep 2016) When is a replication not a replication? (16 Dec 2016) Reproducible practices are the future for early career researchers (1 May 2017) Which neuroimaging measures are useful for individual differences research? (28 May 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017) Pre-registration or replication: the need for new standards in neurogenetic studies (1 Oct 2017) Citing the research literature: the distorting lens of memory (17 Oct 2017) Reproducibility and phonics: necessary but not sufficient (27 Nov 2017) Improving reproducibility: the future is with the young (9 Feb 2018) Sowing seeds of doubt: how Gilbert et al's critique of the reproducibility project has played out (27 May 2018) Preprint publication as karaoke ( 26 Jun 2018) Standing on the shoulders of giants, or slithering around on jellyfish: Why reviews need to be systematic ( 20 Jul 2018) Matlab vs open source: costs and benefits to scientists and society ( 20 Aug 2018) Responding to the replication crisis: reflections on Metascience 2019 (15 Sep 2019) Manipulated images: hiding in plain sight (13 May 2020) Frogs or termites: gunshot or cumulative science? ( 6 Jun 2020) Open data: We know what's needed - now let's make it happen (27 Mar 2021) A proposal for data-sharing the discourages p-hacking (29 Jun 2022) Can systematic reviews help clean up science (9 Aug 2022)Polyunsaturated fatty acids and children's cognition: p-hacking and the canonisation of false facts (4 Sep 2023)  

Statistics
Book review: biography of Richard Doll (5 Jun 2010) Book review: the Invisible Gorilla (30 Jun 2010) The difference between p < .05 and a screening test (23 Jul 2010) Three ways to improve cognitive test scores without intervention (14 Aug 2010) A short nerdy post about the use of percentiles (13 Apr 2011) The joys of inventing data (5 Oct 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Causal models of developmental disorders: the perils of correlational data (24 Jun 2012) Data from the phonics screen (1 Oct 2012)Moderate drinking in pregnancy: toxic or benign? (1 Nov 2012) Flaky chocolate and the New England Journal of Medicine (13 Nov 2012) Interpreting unexpected significant results (7 June 2013) Data analysis: Ten tips I wish I'd known earlier (18 Apr 2014) Data sharing: exciting but scary (26 May 2014) Percentages, quasi-statistics and bad arguments (21 July 2014) Why I still use Excel ( 1 Sep 2016) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017) Prisons, developmental language disorder, and base rates (3 Nov 2017) How Analysis of Variance Works (20 Nov 2017) ANOVA, t-tests and regression: different ways of showing the same thing (24 Nov 2017) Using simulations to understand the importance of sample size (21 Dec 2017) Using simulations to understand p-values (26 Dec 2017) One big study or two small studies? ( 12 Jul 2018) Time to ditch relative risk in media reports (23 Jan 2020)

Journalism/science communication
Orwellian prize for scientific misrepresentation (1 Jun 2010) Journalists and the 'scientific breakthrough' (13 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Orwellian prize for journalistic misrepresentation: an update (29 Jan 2011) Academic publishing: why isn't psychology like physics? (26 Feb 2011) Scientific communication: the Comment option (25 May 2011)  Publishers, psychological tests and greed (30 Dec 2011) Time for academics to withdraw free labour (7 Jan 2012) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Communicating science in the age of the internet (13 Jul 2012) How to bury your academic writing (26 Aug 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013)  A short rant about numbered journal references (5 Apr 2013) Schizophrenia and child abuse in the media (26 May 2013) Why we need pre-registration (6 Jul 2013) On the need for responsible reporting of research (10 Oct 2013) A New Year's letter to academic publishers (4 Jan 2014) Journals without editors: What is going on? (1 Feb 2015) Editors behaving badly? (24 Feb 2015) Will Elsevier say sorry? (21 Mar 2015) How long does a scientific paper need to be? (20 Apr 2015) Will traditional science journals disappear? (17 May 2015) My collapse of confidence in Frontiers journals (7 Jun 2015) Publishing replication failures (11 Jul 2015) Psychology research: hopeless case or pioneering field? (28 Aug 2015) Desperate marketing from J. Neuroscience ( 18 Feb 2016) Editorial integrity: publishers on the front line ( 11 Jun 2016) When scientific communication is a one-way street (13 Dec 2016) Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing (25 Jul 2017) Should editors edit reviewers? ( 26 Aug 2018) Corrigendum: a word you may hope never to encounter (3 Aug 2019) Percent by most prolific author score and editorial bias (12 Jul 2020) PEPIOPs – prolific editors who publish in their own publications (16 Aug 2020) Faux peer-reviewed journals: a threat to research integrity (6 Dec 2020) Time to ditch relative risk in media reports (23 Jan 2020) Time for publishers to consider the rights of readers as well as authors (13 Mar 2021) Universities vs Elsevier: who has the upper hand? (14 Nov 2021) Book Review. Fiona Fox: Beyond the Hype (12 Apr 2022) We need to talk about editors (6 Sep 2022) So do we need editors? (11 Sep 2022) Reviewer-finding algorithms: the dangers for peer review (30 Sep 2022) A desire for clickbait can hinder an academic journal's reputation (4 Oct 2022) What is going on in Hindawi special issues? (12 Oct 2022) New Year's Eve Quiz: Dodgy journals special (31 Dec 2022) A suggestion for e-Life (20 Mar 2023) Papers affected by misconduct: Erratum, correction or retraction? (11 Apr 2023) Is Hindawi “well-positioned for revitalization?” (23 Jul 2023) The discussion section: Kill it or reform it? (14 Aug 2023) Spitting out the AI Gobbledegook sandwich: a suggestion for publishers (2 Oct 2023) The world of Poor Things at MDPI journals (Feb 9 2024) Some thoughts on eLife's New Model: One year on (Mar 27 2024) Does Elsevier's negligence pose a risk to public health? (Jun 20 2024) Collapse of scientific standards at MDPI journals: a case study (Jul 23 2024) My experience as a reviewer for MDPI (Aug 8 2024) Optimizing research integrity investigations: the need for evidence (Aug 22 2024) Now you see it, now you don't: the strange world of disappearing Special Issues at MDPI (Sep 4 2024) Prodding the behemoth with a stick (Sep 14 2024) Using PubPeer to screen editors (Sep 24 2024) An open letter regarding Scientific Reports (Oct 16 2024)

Social Media
A gentle introduction to Twitter for the apprehensive academic (14 Jun 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) Will I still be tweeting in 2013? (2 Jan 2012) Blogging in the service of science (10 Mar 2012) Blogging as post-publication peer review (21 Mar 2013) The impact of blogging on reputation ( 27 Dec 2013) WeSpeechies: A meeting point on Twitter (12 Apr 2014) Email overload ( 12 Apr 2016) How to survive on Twitter - a simple rule to reduce stress (13 May 2018)

Academic life
An exciting day in the life of a scientist (24 Jun 2010) How our current reward structures have distorted and damaged science (6 Aug 2010) The challenge for science: speech by Colin Blakemore (14 Oct 2010) When ethics regulations have unethical consequences (14 Dec 2010) A day working from home (23 Dec 2010) Should we ration research grant applications? (8 Jan 2011) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Should we ever fight lies with lies? (19 Jun 2011) How to survive in psychological research (13 Jul 2011) So you want to be a research assistant? (25 Aug 2011) NHS research ethics procedures: a modern-day Circumlocution Office (18 Dec 2011) The REF: a monster that sucks time and money from academic institutions (20 Mar 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) Journal impact factors and REF2014 (19 Jan 2013)  An alternative to REF2014 (26 Jan 2013) Postgraduate education: time for a rethink (9 Feb 2013)  Ten things that can sink a grant proposal (19 Mar 2013)Blogging as post-publication peer review (21 Mar 2013) The academic backlog (9 May 2013)  Discussion meeting vs conference: in praise of slower science (21 Jun 2013) Why we need pre-registration (6 Jul 2013) Evaluate, evaluate, evaluate (12 Sep 2013) High time to revise the PhD thesis format (9 Oct 2013) The Matthew effect and REF2014 (15 Oct 2013) The University as big business: the case of King's College London (18 June 2014) Should vice-chancellors earn more than the prime minister? (12 July 2014)  Some thoughts on use of metrics in university research assessment (12 Oct 2014) Tuition fees must be high on the agenda before the next election (22 Oct 2014) Blaming universities for our nation's woes (24 Oct 2014) Staff satisfaction is as important as student satisfaction (13 Nov 2014) Metricophobia among academics (28 Nov 2014) Why evaluating scientists by grant income is stupid (8 Dec 2014) Dividing up the pie in relation to REF2014 (18 Dec 2014)  Shaky foundations of the TEF (7 Dec 2015) A lamentable performance by Jo Johnson (12 Dec 2015) More misrepresentation in the Green Paper (17 Dec 2015) The Green Paper’s level playing field risks becoming a morass (24 Dec 2015) NSS and teaching excellence: wrong measure, wrongly analysed (4 Jan 2016) Lack of clarity of purpose in REF and TEF ( 2 Mar 2016) Who wants the TEF? ( 24 May 2016) Cost benefit analysis of the TEF ( 17 Jul 2016)  Alternative providers and alternative medicine ( 6 Aug 2016) We know what's best for you: politicians vs. experts (17 Feb 2017) Advice for early career researchers re job applications: Work 'in preparation' (5 Mar 2017) Should research funding be allocated at random? (7 Apr 2018) Power, responsibility and role models in academia (3 May 2018) My response to the EPA's 'Strengthening Transparency in Regulatory Science' (9 May 2018) More haste less speed in calls for grant proposals ( 11 Aug 2018) Has the Society for Neuroscience lost its way? ( 24 Oct 2018) The Paper-in-a-Day Approach ( 9 Feb 2019) Benchmarking in the TEF: Something doesn't add up ( 3 Mar 2019) The Do It Yourself conference ( 26 May 2019) A call for funders to ban institutions that use grant capture targets (20 Jul 2019) Research funders need to embrace slow science (1 Jan 2020) Should I stay or should I go: When debate with opponents should be avoided (12 Jan 2020) Stemming the flood of illegal external examiners (9 Feb 2020) What can scientists do in an emergency shutdown? (11 Mar 2020) Stepping back a level: Stress management for academics in the pandemic (2 May 2020)
TEF in the time of pandemic (27 Jul 2020) University staff cuts under the cover of a pandemic: the cases of Liverpool and Leicester (3 Mar 2021) Some quick thoughts on academic boycotts of Russia (6 Mar 2022) When there are no consequences for misconduct (16 Dec 2022) Open letter to CNRS (30 Mar 2023) When privacy rules protect fraudsters (Oct 12, 2023) Defence against the dark arts: a proposal for a new MSc course (Nov 19, 2023) An (intellectually?) enriching opportunity for affiliation (Feb 2 2024) Just make it stop! When will we say that further research isn't needed? (Mar 24 2024) Are commitments to open data policies worth the paper they are written on? (May 26 2024) Whistleblowing, research misconduct, and mental health (Jul 1 2024)

Celebrity scientists/quackery
Three ways to improve cognitive test scores without intervention (14 Aug 2010) What does it take to become a Fellow of the RSM? (24 Jul 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) How to become a celebrity scientific expert (12 Sep 2011) The kids are all right in daycare (14 Sep 2011)  The weird world of US ethics regulation (25 Nov 2011) Pioneering treatment or quackery? How to decide (4 Dec 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Why most scientists don't take Susan Greenfield seriously (26 Sept 2014) NeuroPointDX's blood test for Autism Spectrum Disorder ( 12 Jan 2019) Low-level lasers. Part 1. Shining a light on an unconventional treatment for autism (Nov 25, 2023) Low-level lasers. Part 2. Erchonia and the universal panacea (Dec 5, 2023)

Women
Academic mobbing in cyberspace (30 May 2010) What works for women: some useful links (12 Jan 2011) The burqua ban: what's a liberal response (21 Apr 2011) C'mon sisters! Speak out! (28 Mar 2012) Psychology: where are all the men? (5 Nov 2012) Should Rennard be reinstated? (1 June 2014) How the media spun the Tim Hunt story (24 Jun 2015)

Politics and Religion
Lies, damned lies and spin (15 Oct 2011) A letter to Nick Clegg from an ex liberal democrat (11 Mar 2012) BBC's 'extensive coverage' of the NHS bill (9 Apr 2012) Schoolgirls' health put at risk by Catholic view on vaccination (30 Jun 2012) A letter to Boris Johnson (30 Nov 2013) How the government spins a crisis (floods) (1 Jan 2014) The alt-right guide to fielding conference questions (18 Feb 2017) We know what's best for you: politicians vs. experts (17 Feb 2017) Barely a good word for Donald Trump in Houses of Parliament (23 Feb 2017) Do you really want another referendum? Be careful what you wish for (12 Jan 2018) My response to the EPA's 'Strengthening Transparency in Regulatory Science' (9 May 2018) What is driving Theresa May? ( 27 Mar 2019) A day out at 10 Downing St (10 Aug 2019) Voting in the EU referendum: Ignorance, deceit and folly ( 8 Sep 2019) Harry Potter and the Beast of Brexit (20 Oct 2019) Attempting to communicate with the BBC (8 May 2020) Boris bingo: strategies for (not) answering questions (29 May 2020) Linking responsibility for climate refugees to emissions (23 Nov 2021) Response to Philip Ball's critique of scientific advisors (16 Jan 2022) Boris Johnson leads the world ....in the number of false facts he can squeeze into a session of PMQs (20 Jan 2022) Some quick thoughts on academic boycotts of Russia (6 Mar 2022) Contagion of the political system (3 Apr 2022)When there are no consequences for misconduct (16 Dec 2022)

Humour and miscellaneous Orwellian prize for scientific misrepresentation (1 Jun 2010) An exciting day in the life of a scientist (24 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Parasites, pangolins and peer review (26 Nov 2010) A day working from home (23 Dec 2010) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Scientific communication: the Comment option (25 May 2011) How to survive in psychological research (13 Jul 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) The bewildering bathroom challenge (19 Jul 2012) Are Starbucks hiding their profits on the planet Vulcan? (15 Nov 2012) Forget the Tower of Hanoi (11 Apr 2013) How do you communicate with a communications company? ( 30 Mar 2014) Noah: A film review from 32,000 ft (28 July 2014) The rationalist spa (11 Sep 2015) Talking about tax: weasel words ( 19 Apr 2016) Controversial statues: remove or revise? (22 Dec 2016) The alt-right guide to fielding conference questions (18 Feb 2017) My most popular posts of 2016 (2 Jan 2017) An index of neighbourhood advantage from English postcode data ( 15 Sep 2018) Working memories: A brief review of Alan Baddeley's memoir ( 13 Oct 2018) New Year's Eve Quiz: Dodgy journals special (31 Dec 2022)

Monday, 2 October 2023

Spitting out the AI Gobbledegook sandwich: a suggestion for publishers

 


The past couple of years have been momentous for some academic publishers. As documented in a preprint this week, after rapid growth, largely via "special issues" of journals, they have dramatically increased the number of published articles, and at the same time made enormous profits. A recent guest post by Huanzi Zhang, however, showed this has not been without problems. Unscrupulous operators of so-called "papermills" saw an opportunity to boost their own profits by selling authorship slots and then placing fraudulent articles in special issues that were controlled by complicit editors. Gradually, publishers realised they had a problem and started to retract fraudulent articles. To date, Hindawi has retracted over 5000 articles since 2021*.  As described in Huanzi's blogpost, this has made shareholders nervous and dented the profits of parent company Wiley. 

 

There are numerous papermills, and we only know about the less competent ones whose dodgy articles are relatively easy to detect. For a deep dive into papermills in Hindawi journals see this blogpost by the anonymous sleuth Parashorea tomentella.  At least one papermill is the source of a series of articles that follow a template that I have termed the "AI gobbledegook sandwich".  See for instance my comments here on an article that has yet to be retracted. For further examples, search the website PubPeer with the search term "gobbledegook sandwich". 

 

After studying a number of these articles, my impression is that they are created as follows. You start with a genuine article. Most of these look like student projects. The topics are various, but in general they are weak on scientific content. They may be a review of an area, or if data is gathered, it is likely to be some kind of simple survey.  In some cases, reference is made to a public dataset. To create a paper for submission, the following steps are taken:

 

·      The title is changed to include terms that relate to the topic of a special issue, such as "Internet of Things" or "Big data".

·      Phrases are scattered in the Abstract and Introduction mentioning these terms.

·      A technical section is embedded in the middle of the original piece describing the method to be used.  Typically this is full of technical equations. I suspect these are usually correct, in that they use standard formulae from areas such as machine learning, and in some cases can be traced to Wikipedia or another source.  It is not uncommon to see very basic definitions, e.g. formulae for sensitivity and specificity of prediction.

·      A results section is created showing figures that purport to demonstrate how the AI method has been applied to the data. This often reveals that the paper is problematic, as plots are at best unclear and at worst bear no relationship to anything that has gone before.  Labels for figures and axes tend to be vague. A typical claim is that the prediction from the AI model is better than results from other, competing models. It is usually hard to work out what is being predicted from what.

·      The original essay resumes for a Conclusions section, but with a sentence added to say how AI methods have been useful in improving our understanding.

·      An optional additional step is to sprinkle irrelevant citations in the text: we know that papermills collect further income by selling citations, and new papers can act as vehicles for these.


Papermills have got away with this, because the content of these articles is sufficiently technical and complex that the fraud may only be detectable on close reading. Where I am confident there is fraud, I will use the term "Gobbledegook sandwich" in my report on PubPeer, but there are many, many papers where my suspicions are raised but it would take more time than it is worth for me to comb through the article to find compelling evidence.

 

For a papermill, the beauty of the AI gobbledegook sandwich is that you can apply AI methods to almost any topic, and there are so many different algorithms that can be used that there is a potentially infinite number of papers that can be written according to this template.  The ones I have documented include topics ranging from educational methods, hotel management, sports, art, archaeology, Chinese medicine, music, building design, mental health and promotion of Marxist ideology. In none of these papers did the application of AI methods make any sense, and they would not get past a competent editor or reviewers, but once a complicit editor is planted in a journal, they can accept numerous articles. 

 

Recently, Hindawi has ramped up its integrity operations and is employing many more staff to try and shut this particular stable door.  But Hindawi is surely not the only publisher infected by this kind of fraud, and we need a solution that can be used by all journals. My simple suggestion is to focus on prevention rather than cure, by requiring that all articles that report work using AI/ML methods adopt reporting standards that are being developed for machine-learning based science, as described on this website.  This requires computational reproducibility, i.e., data and scripts must be provided so that all results can be reproduced.  This would be a logical impossibility for AI gobbledegook sandwiches.

 

Open science practices were developed with the aim of improving reproducibility and credibility of science, but, as I've argued elsewhere, they could be highly effective in preventing fraud.  Mandating reporting standards could be an important step, which, if accompanied also by open peer review, will make life of the papermillers much harder.



*Source is spreadsheet maintained by the anonymous sleuth Parashorea tomentella

 

N.B. Comments on this blog are moderated, so there may be a delay before they appear. 






Monday, 4 September 2023

Polyunsaturated fatty acids and children's cognition: p-hacking and the canonisation of false facts

One of my favourite articles is a piece by Nissen et al (2016) called "Publication bias and the canonization of false facts". In it, the authors model how false information can masquerade as overwhelming evidence, if, over cycles of experimentation, positive results are more likely to be published than null ones. But their article is not just about publication bias: they go on to show how p-hacking magnifies this effect, because it leads to a false positive rate that is much higher than the nominal rate (typically .05).

I was reminded of this when looking at some literature on polyunsaturated fatty acids and children's cognition. This was a topic I'd had a passing interest in years ago when fish oil was being promoted for children with dyslexia and ADHD. I reviewed the literature back in 2008 for a talk at the British Dyslexia Association (slides here). What was striking then was that, whilst there were studies claiming positive effects of dietary supplements, they all obtained different findings. It looked suspicious to me, as if authors would keep looking in their data, and divide it up every way possible, in order to find something positive to report – in other words, p-hacking seemed rife in this field.

My interest in this area was piqued more recently simply because I was looking at articles that had been flagged up because they contained "tortured phrases". These are verbal expressions that seem to have been selected to avoid plagiarism detectors: they are often unintentionally humorous, because attempts to generate synonyms misfire. For instance, in this article by Khalid et al, published in Taylor and Francis' International Journal of Food Properties we are told: 

"Parkinson’s infection is a typical neurodegenerative sickness. The mix of hereditary and natural variables might be significant in delivering unusual protein inside explicit neuronal gatherings, prompting cell brokenness and later demise" 

And, regarding autism: 

"Chemical imbalance range problem is a term used to portray various beginning stage social correspondence issues and tedious sensorimotor practices identified with a solid hereditary part and different reasons."

The paper was interesting, though, for another reason. It contained a table summarising results from ten randomized controlled trials of polyunsaturated fatty acid supplementation in pregnant women and young children. This was not a systematic review, and it was unclear how the studies had been selected. As I documented on PubPeer,  there were errors in the descriptions of some of the studies, and the interpretation was superficial. But as I checked over the studies, I was also struck by the fact that all studies concluded with a claim of a positive finding, even when the planned analyses gave null results. But, as with the studies I'd looked at in 2008, no two studies found the same thing. All the indicators were that this field is characterised by a mixture of p-hacking and hype, which creates the impression that the benefits of dietary supplementation are well-established, when a more dispassionate look at the evidence suggests considerable scepticism is warranted.

There were three questionable research practices that were prominent. First, testing a large number of 'primary research outcomes' without any correction for multiple comparisons. Three of the papers cited by Khalid did this, and they are marked in Table 1 below with "hmm" in the main analysis column. Two of them argued against using a method such as Bonferroni correction:

"Owing to the exploratory nature of this study, we did not wish to exclude any important relationships by using stringent correction factors for multiple analyses, and we recognised the potential for a type 1 error." (Dunstan et al, 2008)

"Although multiple comparisons are inevitable in studies of this nature, the statistical corrections that are often employed to address this (e.g. Bonferroni correction) infer that multiple relationships (even if consistent and significant) detract from each other, and deal with this by adjustments that abolish any findings without extremely significant levels (P values). However, it has been validly argued that where there are consistent, repeated, coherent and biologically plausible patterns, the results ‘reinforce’ rather than detract from each other (even if P values are significant but not very large)" (Meldrum et al, 2012)
While it is correct that Bonferroni correction is overconservative with correlated outcome measures, there are other methods for protecting the analysis from inflated type I error that should be applied in such cases (Bishop, 2023).

The second practice is conducting subgroup analyses: the initial analysis finds nothing, so a way is found to divide up the sample to find a subgroup that does show the effect. There is a nice paper by Peto that explains the dangers of doing this. The third practice, looking for correlations between variables rather than main effects of intervention: with sufficient variables, it is always possible to find something 'significant' if you don't employ any correction for multiple comparisons. This inflation of false positives by correlational analysis is a well-recognised problem in the field of neuroscience (e.g. Vul et al., 2008).

Given that such practices were normative in my own field of psychology for many years, I suspect that those who adopt them here are unaware of how serious a risk they run of finding spurious positive results. For instance, if you compare two groups on ten unrelated outcome measures, then the probability that something will give you a 'significant' p-value below .05 is not 5% but 40%. (The probability that none of the 10 results is significant is .95^10, which is .6. So the probability that at least one is below .05 is 1-.6 = .4). Dividing a sample into subgroups in the hope of finding something 'significant' is another way to multiply the rate of false positive findings. 

In many fields, p-hacking is virtually impossible to detect because authors will selectively report their 'significant' findings, so the true false positive rate can't be estimated. In randomised controlled trials, the situation is a bit better, provided the study has been registered on a trial registry – this is now standard practice, precisely because it's recognised as an important way to avoid, or at least increase detection of, analytic flexibility and outcome switching. Accordingly, I catalogued, for the 10 studies reviewed by Khalid et al, how many found a significant effect of intervention on their planned, primary outcome measure, and how many focused on other results. The results are depressing. Flexible analyses are universal. Some authors emphasised the provisional nature of findings from exploratory analyses, but many did not. And my suspicion is that, even if the authors add a word of caution, those citing the work will ignore it.  


Table 1: Reporting outcomes for 10 studies cited by Khalid et al (2022)

Khalid # Register N Main result* Subgrp Correlatn Abs -ve Abs +ve
41 yes 86 NS yes no no yes
42 no 72 hmm no no no yes
43 no 420 hmm no no yes yes
44 yes 90 NS no yes yes yes
45 no 90 yes no yes NA
yes
46 yes 150 hmm no no yes yes
47 yes 175 NS no yes yes yes
48 no 107 NS yes no yes yes
49 yes 1094 NS yes no yes yes
50 no 27 yes no no yes yes

Key: Main result coded as NS (nonsignificant), yes (significant) or hmm (not significant if Bonferroni corrected); Subgrp and Correlatn coded yes or no depending on whether post hoc subgroup or correlational analyses conducted. Abs -ve coded yes if negative results reported in abstract, no if not, and NA if no negative results obtained. Abs +ve coded yes if positive results mentioned in abstract.

I don't know if the Khalid et al review will have any effect – it is so evidently flawed that I hope it will be retracted. But the problems it reveals are not just a feature of the odd rogue review: there is a systemic problem with this area of science, whereby the desire to find positive results, coupled with questionable research practices and publication bias, have led to the construction of a huge edifice of evidence based on extremely shaky foundations. The resulting waste in researcher time and funding that comes from pursuing phantom findings is a scandal that can only be addressed by researchers prioritising rigour, honesty and scholarship over fast and flashy science.