Thursday 12 October 2023

When privacy rules protect fraudsters

 

 
I was recently contacted with what I thought was a simple request: could I check the Oxford University Gazette to confirm that a person, X, had undergone an oral examination (viva) for a doctorate a few years ago. The request came indirectly from a third party, Y, via a colleague who knew that on the one hand I was interested in scientific fraud, and on the other hand, that I was based at Oxford.

My first thought was that this was a rather cumbersome way of checking someone's credentials. For a start, as Y had discovered, you can consult the on-line University Gazette only if you have an official affiliation with the university. In theory, when someone has a viva, the internal examiner notifies the University Gazette, which announces details in advance so that members of the university can attend if they so wish. In practice, it is vanishingly rare for an audience to turn up, and the formal notification to the Gazette may get overlooked.

But why, I wondered, didn't Y just check the official records of Oxford University listing names and dates of degrees? Well, to my surprise, it turned out that you can't do that. The university website is clear that to verify someone's qualifications you need to meet two conditions. First, the request can only be made by "employers, prospective employers, other educational institutions, funding bodies or recognised voluntary organisations". Second, "the student's permission ... should be acquired prior to making any verification request".

Anyhow, I found evidence online that X had been a graduate student at the university, but when I checked the Gazette I could find no mention of X having had an oral examination. The other source of evidence would be the University Library where there should be a copy of the thesis for all higher degrees. I couldn't find it in the catalogue. I suggested that Y might check further but they were already ahead of me, and had confirmed with the librarian that no thesis had been deposited in that name.

Now, I have no idea whether X is fraudulently claiming to have an Oxford doctorate, but I'm concerned that it is so hard for a private individual to validate someone's credentials. As far as I can tell, the justification comes from data protection regulations, which control what information organisations can hold about individuals. This is not an Oxford-specific interpretation of rules - I checked a few other UK universities, and the same processes apply.

Having said that, Y pointed out to me that there is a precedent for Oxford University to provide information when there is media interest in a high-profile case: in response to a freedom of information request, they confirmed that Ferdinand Marcus Jr did not have the degree he was claiming.

There will always be tension between openness and the individual's right to privacy, but the way the rules are interpreted mean that anyone could claim they had a degree from a UK university and it would be impossible to check this. Is there a solution? I'm no lawyer, but I would have thought it should be trivial to require that on receipt of a degree, the student is asked to give signed permission for their name, degree and date of degree to be recorded on a publicly searchable database. I can't see a downside to this, and going forward it would save a lot of administrative time dealing with verification requests.

Something like this does seem to work outside Europe. I only did a couple of spot checks, but found this for York University (Ontario):

"It is the University's policy to make information about the degrees or credentials conferred by the University and the dates of conferral routinely available. In order to protect our alumni information as much as possible, YU Verify will give users a result only if the search criteria entered matches a unique record. The service will not display a list of names which may match criteria and allow you to select."

And for Macquarie University, Australia, there is exactly the kind of searchable website that I'd assumed Oxford would have.

I'd be interested if anyone can think of unintended bad consequences of this approach. I had a bit of to-and-fro on Twitter about this with someone who argued that it was best to keep as much information as possible out of the public domain. I remain unconvinced: academic qualifications are important for providing someone with credentials as an expert, and if we make it easy for anyone to pretend to have a degree from a prestigious institution, I think the potential for harm is far greater than any harms caused by lack of privacy. Or have I missed something? 

 N.B. Comments on the blog are moderated so may only appear after a delay.


P.S. Some thoughts via Mastodon from Martin Vueilleme on potential drawback of directory: 

Far fetched, but I could see the following reasons:

- You live in an oppressive country that targets academics, intellectuals
- Hiding your university helps prevent stalkers (or other predators) from getting further information on you
- Hiding your university background to fit in a group
- Your thesis is on a sensitive topic or a topic forbidden from being studied where you live
- Hiding your university degree because you were technically not allowed to get it (eg women)

My (DB) response is that I think that in terms of balancing probabilities of risks against the risk of fraudsters benefiting from lack of checking, the case for the open directory is strengthened, as these risks seem very slight for UK universities (at least for now!). And the other cost/benefit analysis is of finances, where an open directory would seem superior; i.e. it costs to maintain the directory, but that has to be done anyhow, Currently there are extra costs for people who are employed to respond to requests for validation.

Tuesday 3 October 2023

Bishopblog catalogue (updated 4 October 2023)

Source: http://www.weblogcartoons.com/2008/11/23/ideas/

Those of you who follow this blog may have noticed a lack of thematic coherence. I write about whatever is exercising my mind at the time, which can range from technical aspects of statistics to the design of bathroom taps. I decided it might be helpful to introduce a bit of order into this chaotic melange, so here is a catalogue of posts by topic.

Language impairment, dyslexia and related disorders
The common childhood disorders that have been left out in the cold (1 Dec 2010) What's in a name? (18 Dec 2010) Neuroprognosis in dyslexia (22 Dec 2010) Where commercial and clinical interests collide: Auditory processing disorder (6 Mar 2011) Auditory processing disorder (30 Mar 2011) Special educational needs: will they be met by the Green paper proposals? (9 Apr 2011) Is poor parenting really to blame for children's school problems? (3 Jun 2011) Early intervention: what's not to like? (1 Sep 2011) Lies, damned lies and spin (15 Oct 2011) A message to the world (31 Oct 2011) Vitamins, genes and language (13 Nov 2011) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Phonics screening: sense and sensibility (3 Apr 2012) What Chomsky doesn't get about child language (3 Sept 2012) Data from the phonics screen (1 Oct 2012) Auditory processing disorder: schisms and skirmishes (27 Oct 2012) High-impact journals (Action video games and dyslexia: critique) (10 Mar 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) Raising awareness of language learning impairments (26 Sep 2013) Good and bad news on the phonics screen (5 Oct 2013) What is educational neuroscience? (25 Jan 2014) Parent talk and child language (17 Feb 2014) My thoughts on the dyslexia debate (20 Mar 2014) Labels for unexplained language difficulties in children (23 Aug 2014) International reading comparisons: Is England really do so poorly? (14 Sep 2014) Our early assessments of schoolchildren are misleading and damaging (4 May 2015) Opportunity cost: a new red flag for evaluating interventions (30 Aug 2015) The STEP Physical Literacy programme: have we been here before? (2 Jul 2017) Prisons, developmental language disorder, and base rates (3 Nov 2017) Reproducibility and phonics: necessary but not sufficient (27 Nov 2017) Developmental language disorder: the need for a clinically relevant definition (9 Jun 2018) Changing terminology for children's language disorders (23 Feb 2020) Developmental Language Disorder (DLD) in relaton to DSM5 (29 Feb 2020) Why I am not engaging with the Reading Wars (30 Jan 2022)

Autism
Autism diagnosis in cultural context (16 May 2011) Are our ‘gold standard’ autism diagnostic instruments fit for purpose? (30 May 2011) How common is autism? (7 Jun 2011) Autism and hypersystematising parents (21 Jun 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) The ‘autism epidemic’ and diagnostic substitution (4 Jun 2012) How wishful thinking is damaging Peta's cause (9 June 2014) NeuroPointDX's blood test for Autism Spectrum Disorder ( 12 Jan 2019) Biomarkers to screen for autism (again) (6 Dec 2022)

Developmental disorders/paediatrics
The hidden cost of neglected tropical diseases (25 Nov 2010) The National Children's Study: a view from across the pond (25 Jun 2011) The kids are all right in daycare (14 Sep 2011) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Changing the landscape of psychiatric research (11 May 2014) The sinister side of French psychoanalysis revealed (15 Oct 2019) A desire for clickbait can hinder an academic journal's reputation (4 Oct 2022) Polyunsaturated fatty acids and children's cognition: p-hacking and the canonisation of false facts (4 Sep 2023)

Genetics
Where does the myth of a gene for things like intelligence come from? (9 Sep 2010) Genes for optimism, dyslexia and obesity and other mythical beasts (10 Sep 2010) The X and Y of sex differences (11 May 2011) Review of How Genes Influence Behaviour (5 Jun 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Moderate drinking in pregnancy: toxic or benign? (21 Nov 2012) Genes, brains and lateralisation (22 Dec 2012) Genetic variation and neuroimaging (11 Jan 2013) Have we become slower and dumber? (15 May 2013) Overhyped genetic findings: the case of dyslexia (16 Jun 2013) Incomprehensibility of much neurogenetics research ( 1 Oct 2016) A common misunderstanding of natural selection (8 Jan 2017) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Pre-registration or replication: the need for new standards in neurogenetic studies (1 Oct 2017) Review of 'Innate' by Kevin Mitchell ( 15 Apr 2019) Why eugenics is wrong (18 Feb 2020)

Neuroscience
Neuroprognosis in dyslexia (22 Dec 2010) Brain scans show that… (11 Jun 2011)  Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Neuronal migration in language learning impairments (2 May 2012) Sharing of MRI datasets (6 May 2012) Genetic variation and neuroimaging (1 Jan 2013) The arcuate fasciculus and word learning (11 Aug 2013) Changing children's brains (17 Aug 2013) What is educational neuroscience? ( 25 Jan 2014) Changing the landscape of psychiatric research (11 May 2014) Incomprehensibility of much neurogenetics research ( 1 Oct 2016)

Reproducibility
Accentuate the negative (26 Oct 2011) Novelty, interest and replicability (19 Jan 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013) Who's afraid of open data? (15 Nov 2015) Blogging as post-publication peer review (21 Mar 2013) Research fraud: More scrutiny by administrators is not the answer (17 Jun 2013) Pressures against cumulative research (9 Jan 2014) Why does so much research go unpublished? (12 Jan 2014) Replication and reputation: Whose career matters? (29 Aug 2014) Open code: note just data and publications (6 Dec 2015) Why researchers need to understand poker ( 26 Jan 2016) Reproducibility crisis in psychology ( 5 Mar 2016) Further benefit of registered reports ( 22 Mar 2016) Would paying by results improve reproducibility? ( 7 May 2016) Serendipitous findings in psychology ( 29 May 2016) Thoughts on the Statcheck project ( 3 Sep 2016) When is a replication not a replication? (16 Dec 2016) Reproducible practices are the future for early career researchers (1 May 2017) Which neuroimaging measures are useful for individual differences research? (28 May 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017) Pre-registration or replication: the need for new standards in neurogenetic studies (1 Oct 2017) Citing the research literature: the distorting lens of memory (17 Oct 2017) Reproducibility and phonics: necessary but not sufficient (27 Nov 2017) Improving reproducibility: the future is with the young (9 Feb 2018) Sowing seeds of doubt: how Gilbert et al's critique of the reproducibility project has played out (27 May 2018) Preprint publication as karaoke ( 26 Jun 2018) Standing on the shoulders of giants, or slithering around on jellyfish: Why reviews need to be systematic ( 20 Jul 2018) Matlab vs open source: costs and benefits to scientists and society ( 20 Aug 2018) Responding to the replication crisis: reflections on Metascience 2019 (15 Sep 2019) Manipulated images: hiding in plain sight (13 May 2020) Frogs or termites: gunshot or cumulative science? ( 6 Jun 2020) Open data: We know what's needed - now let's make it happen (27 Mar 2021) A proposal for data-sharing the discourages p-hacking (29 Jun 2022) Can systematic reviews help clean up science (9 Aug 2022)Polyunsaturated fatty acids and children's cognition: p-hacking and the canonisation of false facts (4 Sep 2023)  

Statistics
Book review: biography of Richard Doll (5 Jun 2010) Book review: the Invisible Gorilla (30 Jun 2010) The difference between p < .05 and a screening test (23 Jul 2010) Three ways to improve cognitive test scores without intervention (14 Aug 2010) A short nerdy post about the use of percentiles (13 Apr 2011) The joys of inventing data (5 Oct 2011) Getting genetic effect sizes in perspective (20 Apr 2012) Causal models of developmental disorders: the perils of correlational data (24 Jun 2012) Data from the phonics screen (1 Oct 2012)Moderate drinking in pregnancy: toxic or benign? (1 Nov 2012) Flaky chocolate and the New England Journal of Medicine (13 Nov 2012) Interpreting unexpected significant results (7 June 2013) Data analysis: Ten tips I wish I'd known earlier (18 Apr 2014) Data sharing: exciting but scary (26 May 2014) Percentages, quasi-statistics and bad arguments (21 July 2014) Why I still use Excel ( 1 Sep 2016) Sample selection in genetic studies: impact of restricted range (23 Apr 2017) Prospecting for kryptonite: the value of null results (17 Jun 2017) Prisons, developmental language disorder, and base rates (3 Nov 2017) How Analysis of Variance Works (20 Nov 2017) ANOVA, t-tests and regression: different ways of showing the same thing (24 Nov 2017) Using simulations to understand the importance of sample size (21 Dec 2017) Using simulations to understand p-values (26 Dec 2017) One big study or two small studies? ( 12 Jul 2018) Time to ditch relative risk in media reports (23 Jan 2020)

Journalism/science communication
Orwellian prize for scientific misrepresentation (1 Jun 2010) Journalists and the 'scientific breakthrough' (13 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Orwellian prize for journalistic misrepresentation: an update (29 Jan 2011) Academic publishing: why isn't psychology like physics? (26 Feb 2011) Scientific communication: the Comment option (25 May 2011)  Publishers, psychological tests and greed (30 Dec 2011) Time for academics to withdraw free labour (7 Jan 2012) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) Time for neuroimaging (and PNAS) to clean up its act (5 Mar 2012) Communicating science in the age of the internet (13 Jul 2012) How to bury your academic writing (26 Aug 2012) High-impact journals: where newsworthiness trumps methodology (10 Mar 2013)  A short rant about numbered journal references (5 Apr 2013) Schizophrenia and child abuse in the media (26 May 2013) Why we need pre-registration (6 Jul 2013) On the need for responsible reporting of research (10 Oct 2013) A New Year's letter to academic publishers (4 Jan 2014) Journals without editors: What is going on? (1 Feb 2015) Editors behaving badly? (24 Feb 2015) Will Elsevier say sorry? (21 Mar 2015) How long does a scientific paper need to be? (20 Apr 2015) Will traditional science journals disappear? (17 May 2015) My collapse of confidence in Frontiers journals (7 Jun 2015) Publishing replication failures (11 Jul 2015) Psychology research: hopeless case or pioneering field? (28 Aug 2015) Desperate marketing from J. Neuroscience ( 18 Feb 2016) Editorial integrity: publishers on the front line ( 11 Jun 2016) When scientific communication is a one-way street (13 Dec 2016) Breaking the ice with buxom grapefruits: Pratiques de publication and predatory publishing (25 Jul 2017) Should editors edit reviewers? ( 26 Aug 2018) Corrigendum: a word you may hope never to encounter (3 Aug 2019) Percent by most prolific author score and editorial bias (12 Jul 2020) PEPIOPs – prolific editors who publish in their own publications (16 Aug 2020) Faux peer-reviewed journals: a threat to research integrity (6 Dec 2020) Time to ditch relative risk in media reports (23 Jan 2020) Time for publishers to consider the rights of readers as well as authors (13 Mar 2021) Universities vs Elsevier: who has the upper hand? (14 Nov 2021) Book Review. Fiona Fox: Beyond the Hype (12 Apr 2022) We need to talk about editors (6 Sep 2022) So do we need editors? (11 Sep 2022) Reviewer-finding algorithms: the dangers for peer review (30 Sep 2022) A desire for clickbait can hinder an academic journal's reputation (4 Oct 2022) What is going on in Hindawi special issues? (12 Oct 2022) New Year's Eve Quiz: Dodgy journals special (31 Dec 2022) A suggestion for e-Life (20 Mar 2023) Papers affected by misconduct: Erratum, correction or retraction? (11 Apr 2023) Is Hindawi “well-positioned for revitalization?” (23 Jul 2023) The discussion section: Kill it or reform it? (14 Aug 2023) Spitting out the AI Gobbledegook sandwich: a suggestion for publishers (2 Oct 2023)

Social Media
A gentle introduction to Twitter for the apprehensive academic (14 Jun 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) Will I still be tweeting in 2013? (2 Jan 2012) Blogging in the service of science (10 Mar 2012) Blogging as post-publication peer review (21 Mar 2013) The impact of blogging on reputation ( 27 Dec 2013) WeSpeechies: A meeting point on Twitter (12 Apr 2014) Email overload ( 12 Apr 2016) How to survive on Twitter - a simple rule to reduce stress (13 May 2018)

Academic life
An exciting day in the life of a scientist (24 Jun 2010) How our current reward structures have distorted and damaged science (6 Aug 2010) The challenge for science: speech by Colin Blakemore (14 Oct 2010) When ethics regulations have unethical consequences (14 Dec 2010) A day working from home (23 Dec 2010) Should we ration research grant applications? (8 Jan 2011) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Should we ever fight lies with lies? (19 Jun 2011) How to survive in psychological research (13 Jul 2011) So you want to be a research assistant? (25 Aug 2011) NHS research ethics procedures: a modern-day Circumlocution Office (18 Dec 2011) The REF: a monster that sucks time and money from academic institutions (20 Mar 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) Journal impact factors and REF2014 (19 Jan 2013)  An alternative to REF2014 (26 Jan 2013) Postgraduate education: time for a rethink (9 Feb 2013)  Ten things that can sink a grant proposal (19 Mar 2013)Blogging as post-publication peer review (21 Mar 2013) The academic backlog (9 May 2013)  Discussion meeting vs conference: in praise of slower science (21 Jun 2013) Why we need pre-registration (6 Jul 2013) Evaluate, evaluate, evaluate (12 Sep 2013) High time to revise the PhD thesis format (9 Oct 2013) The Matthew effect and REF2014 (15 Oct 2013) The University as big business: the case of King's College London (18 June 2014) Should vice-chancellors earn more than the prime minister? (12 July 2014)  Some thoughts on use of metrics in university research assessment (12 Oct 2014) Tuition fees must be high on the agenda before the next election (22 Oct 2014) Blaming universities for our nation's woes (24 Oct 2014) Staff satisfaction is as important as student satisfaction (13 Nov 2014) Metricophobia among academics (28 Nov 2014) Why evaluating scientists by grant income is stupid (8 Dec 2014) Dividing up the pie in relation to REF2014 (18 Dec 2014)  Shaky foundations of the TEF (7 Dec 2015) A lamentable performance by Jo Johnson (12 Dec 2015) More misrepresentation in the Green Paper (17 Dec 2015) The Green Paper’s level playing field risks becoming a morass (24 Dec 2015) NSS and teaching excellence: wrong measure, wrongly analysed (4 Jan 2016) Lack of clarity of purpose in REF and TEF ( 2 Mar 2016) Who wants the TEF? ( 24 May 2016) Cost benefit analysis of the TEF ( 17 Jul 2016)  Alternative providers and alternative medicine ( 6 Aug 2016) We know what's best for you: politicians vs. experts (17 Feb 2017) Advice for early career researchers re job applications: Work 'in preparation' (5 Mar 2017) Should research funding be allocated at random? (7 Apr 2018) Power, responsibility and role models in academia (3 May 2018) My response to the EPA's 'Strengthening Transparency in Regulatory Science' (9 May 2018) More haste less speed in calls for grant proposals ( 11 Aug 2018) Has the Society for Neuroscience lost its way? ( 24 Oct 2018) The Paper-in-a-Day Approach ( 9 Feb 2019) Benchmarking in the TEF: Something doesn't add up ( 3 Mar 2019) The Do It Yourself conference ( 26 May 2019) A call for funders to ban institutions that use grant capture targets (20 Jul 2019) Research funders need to embrace slow science (1 Jan 2020) Should I stay or should I go: When debate with opponents should be avoided (12 Jan 2020) Stemming the flood of illegal external examiners (9 Feb 2020) What can scientists do in an emergency shutdown? (11 Mar 2020) Stepping back a level: Stress management for academics in the pandemic (2 May 2020)
TEF in the time of pandemic (27 Jul 2020) University staff cuts under the cover of a pandemic: the cases of Liverpool and Leicester (3 Mar 2021) Some quick thoughts on academic boycotts of Russia (6 Mar 2022) When there are no consequences for misconduct (16 Dec 2022) Open letter to CNRS (30 Mar 2023)

Celebrity scientists/quackery
Three ways to improve cognitive test scores without intervention (14 Aug 2010) What does it take to become a Fellow of the RSM? (24 Jul 2011) An open letter to Baroness Susan Greenfield (4 Aug 2011) Susan Greenfield and autistic spectrum disorder: was she misrepresented? (12 Aug 2011) How to become a celebrity scientific expert (12 Sep 2011) The kids are all right in daycare (14 Sep 2011)  The weird world of US ethics regulation (25 Nov 2011) Pioneering treatment or quackery? How to decide (4 Dec 2011) Psychoanalytic treatment for autism: Interviews with French analysts (23 Jan 2012) Neuroscientific interventions for dyslexia: red flags (24 Feb 2012) Why most scientists don't take Susan Greenfield seriously (26 Sept 2014) NeuroPointDX's blood test for Autism Spectrum Disorder ( 12 Jan 2019)

Women
Academic mobbing in cyberspace (30 May 2010) What works for women: some useful links (12 Jan 2011) The burqua ban: what's a liberal response (21 Apr 2011) C'mon sisters! Speak out! (28 Mar 2012) Psychology: where are all the men? (5 Nov 2012) Should Rennard be reinstated? (1 June 2014) How the media spun the Tim Hunt story (24 Jun 2015)

Politics and Religion
Lies, damned lies and spin (15 Oct 2011) A letter to Nick Clegg from an ex liberal democrat (11 Mar 2012) BBC's 'extensive coverage' of the NHS bill (9 Apr 2012) Schoolgirls' health put at risk by Catholic view on vaccination (30 Jun 2012) A letter to Boris Johnson (30 Nov 2013) How the government spins a crisis (floods) (1 Jan 2014) The alt-right guide to fielding conference questions (18 Feb 2017) We know what's best for you: politicians vs. experts (17 Feb 2017) Barely a good word for Donald Trump in Houses of Parliament (23 Feb 2017) Do you really want another referendum? Be careful what you wish for (12 Jan 2018) My response to the EPA's 'Strengthening Transparency in Regulatory Science' (9 May 2018) What is driving Theresa May? ( 27 Mar 2019) A day out at 10 Downing St (10 Aug 2019) Voting in the EU referendum: Ignorance, deceit and folly ( 8 Sep 2019) Harry Potter and the Beast of Brexit (20 Oct 2019) Attempting to communicate with the BBC (8 May 2020) Boris bingo: strategies for (not) answering questions (29 May 2020) Linking responsibility for climate refugees to emissions (23 Nov 2021) Response to Philip Ball's critique of scientific advisors (16 Jan 2022) Boris Johnson leads the world ....in the number of false facts he can squeeze into a session of PMQs (20 Jan 2022) Some quick thoughts on academic boycotts of Russia (6 Mar 2022) Contagion of the political system (3 Apr 2022)When there are no consequences for misconduct (16 Dec 2022)

Humour and miscellaneous Orwellian prize for scientific misrepresentation (1 Jun 2010) An exciting day in the life of a scientist (24 Jun 2010) Science journal editors: a taxonomy (28 Sep 2010) Parasites, pangolins and peer review (26 Nov 2010) A day working from home (23 Dec 2010) The one hour lecture (11 Mar 2011) The expansion of research regulators (20 Mar 2011) Scientific communication: the Comment option (25 May 2011) How to survive in psychological research (13 Jul 2011) Your Twitter Profile: The Importance of Not Being Earnest (19 Nov 2011) 2011 Orwellian Prize for Journalistic Misrepresentation (29 Jan 2012) The ultimate email auto-response (12 Apr 2012) Well, this should be easy…. (21 May 2012) The bewildering bathroom challenge (19 Jul 2012) Are Starbucks hiding their profits on the planet Vulcan? (15 Nov 2012) Forget the Tower of Hanoi (11 Apr 2013) How do you communicate with a communications company? ( 30 Mar 2014) Noah: A film review from 32,000 ft (28 July 2014) The rationalist spa (11 Sep 2015) Talking about tax: weasel words ( 19 Apr 2016) Controversial statues: remove or revise? (22 Dec 2016) The alt-right guide to fielding conference questions (18 Feb 2017) My most popular posts of 2016 (2 Jan 2017) An index of neighbourhood advantage from English postcode data ( 15 Sep 2018) Working memories: A brief review of Alan Baddeley's memoir ( 13 Oct 2018) New Year's Eve Quiz: Dodgy journals special (31 Dec 2022)

Monday 2 October 2023

Spitting out the AI Gobbledegook sandwich: a suggestion for publishers

 


The past couple of years have been momentous for some academic publishers. As documented in a preprint this week, after rapid growth, largely via "special issues" of journals, they have dramatically increased the number of published articles, and at the same time made enormous profits. A recent guest post by Huanzi Zhang, however, showed this has not been without problems. Unscrupulous operators of so-called "papermills" saw an opportunity to boost their own profits by selling authorship slots and then placing fraudulent articles in special issues that were controlled by complicit editors. Gradually, publishers realised they had a problem and started to retract fraudulent articles. To date, Hindawi has retracted over 5000 articles since 2021*.  As described in Huanzi's blogpost, this has made shareholders nervous and dented the profits of parent company Wiley. 

 

There are numerous papermills, and we only know about the less competent ones whose dodgy articles are relatively easy to detect. For a deep dive into papermills in Hindawi journals see this blogpost by the anonymous sleuth Parashorea tomentella.  At least one papermill is the source of a series of articles that follow a template that I have termed the "AI gobbledegook sandwich".  See for instance my comments here on an article that has yet to be retracted. For further examples, search the website PubPeer with the search term "gobbledegook sandwich". 

 

After studying a number of these articles, my impression is that they are created as follows. You start with a genuine article. Most of these look like student projects. The topics are various, but in general they are weak on scientific content. They may be a review of an area, or if data is gathered, it is likely to be some kind of simple survey.  In some cases, reference is made to a public dataset. To create a paper for submission, the following steps are taken:

 

·      The title is changed to include terms that relate to the topic of a special issue, such as "Internet of Things" or "Big data".

·      Phrases are scattered in the Abstract and Introduction mentioning these terms.

·      A technical section is embedded in the middle of the original piece describing the method to be used.  Typically this is full of technical equations. I suspect these are usually correct, in that they use standard formulae from areas such as machine learning, and in some cases can be traced to Wikipedia or another source.  It is not uncommon to see very basic definitions, e.g. formulae for sensitivity and specificity of prediction.

·      A results section is created showing figures that purport to demonstrate how the AI method has been applied to the data. This often reveals that the paper is problematic, as plots are at best unclear and at worst bear no relationship to anything that has gone before.  Labels for figures and axes tend to be vague. A typical claim is that the prediction from the AI model is better than results from other, competing models. It is usually hard to work out what is being predicted from what.

·      The original essay resumes for a Conclusions section, but with a sentence added to say how AI methods have been useful in improving our understanding.

·      An optional additional step is to sprinkle irrelevant citations in the text: we know that papermills collect further income by selling citations, and new papers can act as vehicles for these.


Papermills have got away with this, because the content of these articles is sufficiently technical and complex that the fraud may only be detectable on close reading. Where I am confident there is fraud, I will use the term "Gobbledegook sandwich" in my report on PubPeer, but there are many, many papers where my suspicions are raised but it would take more time than it is worth for me to comb through the article to find compelling evidence.

 

For a papermill, the beauty of the AI gobbledegook sandwich is that you can apply AI methods to almost any topic, and there are so many different algorithms that can be used that there is a potentially infinite number of papers that can be written according to this template.  The ones I have documented include topics ranging from educational methods, hotel management, sports, art, archaeology, Chinese medicine, music, building design, mental health and promotion of Marxist ideology. In none of these papers did the application of AI methods make any sense, and they would not get past a competent editor or reviewers, but once a complicit editor is planted in a journal, they can accept numerous articles. 

 

Recently, Hindawi has ramped up its integrity operations and is employing many more staff to try and shut this particular stable door.  But Hindawi is surely not the only publisher infected by this kind of fraud, and we need a solution that can be used by all journals. My simple suggestion is to focus on prevention rather than cure, by requiring that all articles that report work using AI/ML methods adopt reporting standards that are being developed for machine-learning based science, as described on this website.  This requires computational reproducibility, i.e., data and scripts must be provided so that all results can be reproduced.  This would be a logical impossibility for AI gobbledegook sandwiches.

 

Open science practices were developed with the aim of improving reproducibility and credibility of science, but, as I've argued elsewhere, they could be highly effective in preventing fraud.  Mandating reporting standards could be an important step, which, if accompanied also by open peer review, will make life of the papermillers much harder.



*Source is spreadsheet maintained by the anonymous sleuth Parashorea tomentella

 

N.B. Comments on this blog are moderated, so there may be a delay before they appear.