Monday, 21 July 2014

Percentages, quasi statistics and bad arguments


Percentages have been much in the news lately. First, we have a PLOS One paper by John Ioannidis and his colleagues which noted that less than one per cent of all publishing scientists in the period from 1996 to 2011 published something in each and every year of this 16-year period.

Then there was have a trailer for a wonderfully silly forthcoming film, Lucy, in which Scarlett Johansson suffers from a drug overdose that leads her to learn Chinese in an hour and develop an uncanny ability to make men fall over by merely pouting at them. Morgan Freeman plays a top neuroscientist who explains that whereas the rest of us use a mere ten per cent of our brain capacity, Johansson's character has access to a full hundred per cent of her brain.

And today I've just read an opinion piece in Prospect Magazine by the usually clear-thinking philosopher, A. C. Grayling, which states: Neuropsychology tells us that more than ninety per cent of mental computation happens below the level of awareness.

Examples like these can be used to demonstrate just how careful you need to be when interpreting percentages. There are two issues. For a start, a percentage is uninterpretable unless you know the origin of the denominator (i.e., the total number of cases that the percentage is based on).  I'm sure the paper by Ioannidis and colleagues is competently conducted, but the result seems far less surprising when you realise that the 'less than one per cent' figure was obtained using a denominator based on all authors mentioned on all papers during the target period. As Ioannidis et al noted, this will include a miscellaneous bunch of people, including those who are unsuccessful at gaining research funding or in getting papers published, those taking career breaks, people who are trainees or research assistants, those working in disciplines where it is normal to publish infrequently, and those who fit in  research activity around clinical responsibilities. Presumably it also includes those who have died, retired, or left the field in the study period.

So if you are someone who publishes regularly, and are feeling smug at your rarity value, you might want to rethink. In fact, given the heterogeneity of the group on whom the numerator is based, I'm not sure what conclusions to draw from this paper. Ioannidis et al noted that those who publish frequently also get cited more frequently – even after taking into account number of publications and concluded that the stability and continuity of the publishing scientific workforce may have important implications for the efficiency of science. But what one should actually do with this information is unclear. The authors suggest that one option is to give more opportunities to younger scientists so that they can join the elite group who publish regularly. However, I suspect that's not how the study will be interpreted: instead, we'll have university administrators adding 'continuity of publishing record' to their set of metrics for recruiting new staff, leading to even more extreme pressure to publish quickly, rather than taking time to reflect on results. A dismal thought indeed.

The other two examples that I cited are worse still. It's not that they have a misleading denominator: as far as one can tell, they don't have a denominator at all.  In effect, they are quasi-statistics. Since the publication of the Lucy trailer, neuroscientists have stepped up to argue that of course we use much more than ten per cent of our brains, and to note that the origin of this mythical statistic is hard to locate (see, for instance here and here). I'd argue there's an even bigger problem – the statement can't be evaluated as accurate or inaccurate without defining what scale is being adopted to quantify 'brain use'. Does it refer to cells, neural networks, white matter, grey matter, or brain regions? Are we only 'using' these if there is measurable activity? And is that activity measured by neural oscillations, synaptic firing, a haemodynamic response or something else?

In a similar vein, in the absence of any supporting reference for the Grayling quote, it remains opaque to me how you'd measure 'mental computation' and then subdivide it into the conscious and the unconscious. Sure, he's right that our brains carry out many computations of which we have no explicit awareness. Language is a classic case – I assume most readers would have no difficulty turning a sentence like You wanted to eat the apples that she gave you into a negative form (You didn't want to eat the apples that she gave you) or a question (Did you want to eat the apples that she gave you?) but unless you are a linguist, you will have difficulty explaining how you did this. I don't take issue with Grayling's main point, but I am surprised that an expert philosopher should introduce a precise number into the argument, when it can readily be shown to be meaningless.

The main point here is that we are readily impressed by numbers. A percentage seems to imply that there is a body of evidence on which a statement is based. But we need to treat percentages with suspicion; unless we can identify the numerator and denominator from which they are calculated, they are likely to just be fancy ways of trying to persuade us into giving more weight to an argument than it deserves.

1 comment:

  1. A major political issue is our society's tendency to undervalue factors that cannot readily be quantified. Those stupid percentages address it by invalid quantification. What better solution is there?