Friday 6 August 2010
How our current reward structures have distorted and damaged science
Two things almost everyone would agree with:
1. Good scientists do research and publish their results, which then have impact on other scientists and the broader community.
2. Science is a competitive business: there is not enough research funding for everyone, not enough academic jobs in science, and not enough space in scientific journals. We therefore need ways of ensuring that the limited resources go to the best people.
When I started in research in the 1970s, research evaluation focused on individuals. If you wrote a grant proposal, applied for a job, or submitted a paper to a journal, evaluation depended on peer review, a process that is known to be flawed and subject to whim and bias, but is nevertheless regarded by many as the best option we have.
What has changed in my lifetime is the increasing emphasis on evaluating institutions rather than individuals. The 1980s saw the introduction of the Research Assessment Exercise, used to evaluate Universities in terms of their research excellence in order to have a more objective and rational basis for allocating central funds (quality weighted research funding or QR) by the national funding council (HEFCE in England). The methods for evaluating institutions evolved over the next 20 years, and are still a matter of debate, but they have subtly influenced the whole process of evaluation of individual academics, because of the need to use standard metrics.
This is inevitable, because the panel evaluating a subject area can't be expected to read all the research produced by staff at an institution, but they would be criticised for operating an 'old boy network', or favouring their own speciality, if they relied just on personal knowledge of who is doing good work – which was what tended to happen before the RAE. Therefore they are forced into using metrics. The two obvious things that can be counted are research income and number of publications. But number of publications was early on recognised as problematic, as it would mean that someone with three parochial reports in a journal of national society would look better than someone with a major breakthrough published in a top journal. There has therefore been an attempt to move from quantity to quality, by taking into account the impact factor of the journals that papers are published in.
Evaluation systems always change the behaviour of those being evaluated, as people attempt to maximise rewards. Recognising that institutional income depends on getting a good RAE score, vice-chancellors and department heads in many institutions now set overt quotas for their staff in terms of expected grant income and number of publications in high impact journals. The jobs market is also affected, as it becomes clear that employability depends on how good one looks on the RAE metrics.
The problem with all of this is that it means that the tail starts to wag the dog. Consider first how the process of grant funding has changed. The motivation to get a grant ought to be that one has an interesting idea and needs money to investigate it. Instead, it has turned into a way of funding the home institution and enhancing employability. Furthermore, the bigger the grant, the more the kudos, and so the pressure is on to do large-scale expensive studies. If individuals were assessed, not in terms of grant income, but in terms of research output relative to grant income, many would change status radically, as cheap, efficient research projects would rise up the scale. In psychology, there has been a trend to bolt on expensive but often unmotivated brain imaging to psychological studies, ensuring that the cost of each experiment is multiplied at least 10-fold. Junior staff are under pressure to obtain a minimum level of research funding, and consequently spend a great deal of time writing grant proposals, and the funding agencies are overwhelmed with applications. In my experience, applications that are written because someone tells you to write one are typically of poor quality, and just waste the time of both applicants and reviewers. The scientist who is successful in meeting their quota is likely to be managing several grants. This may be a good thing if they are really talented, or have superb staff, but in my experience research is done best if the principal investigator puts serious thought and time into the day-to-day running of the project, and that becomes impossible with multiple grants.
Regarding publications, I am old enough to have been publishing before the RAE, and I'm in the fortunate but unusual position of having had full-time research funding for my whole career. In the current system I am relatively safe, and I look good on an RAE return. But most people aren't so fortunate: they are trying to juggle doing research with teaching and administration, raising children and other distractions, yet feel under intense pressure to publish. The worry about the current system is that it will encourage people to cut corners, to favour research that is quick and easy. Sometimes, one is lucky, and a simple study leads to an interesting result that can be published quickly. But the best work typically requires a large investment of time and thought. The studies I am proudest of are ones which have taken years rather than months to complete: in some cases, the time is just on data collection, but in others, the time has involved reading, thinking, and working out ways of analysing and interpreting data. But this kind of paper is getting increasingly rare. As a reviewer, I frequently see piecemeal publication, so if you suggest that a further analysis would strengthen the paper, you are told that it has been done, but is the subject of another paper. Scholarship and contact with prior literature has become extremely rare: prior research is cited without reading it – or not cited at all – and the notion of research building on prior work has been eroded to the point that I sometimes think we are all so busy writing papers that we have no time to read them. There are growing complaints about an 'avalanche' of low-quality publications.
As noted above, in response to this, there has been a move to focus on quality rather than quantity of publications, with researchers being told that their work will only count if it is published in a high-impact journal. Some departments will produce lists of acceptable journals and will discourage staff from publishing elsewhere. In effect, impact factor is being used as a proxy for likelihood that a paper will be cited in future, and I'm sure that is generally true. But just because a paper in a high impact journal is likely to be highly cited, it does not mean that all highly-cited papers appear in high impact journals. In general, my own most highly-cited papers appeared in middle-ranking journals in my field. Moreover, the highest impact journals have several limitations:
1. They only take very short papers. Yes, it is usually possible to put extra information in 'supplementary material', but what you can't do is to take up space putting the work in context or discussing alternative interpretations. When I started in the field, it was not uncommon to publish a short paper in Nature, followed up with a more detailed account in another, lowlier, journal. But that no longer happens .We only get the brief account.
2. Demand for page space outstrips supply. To handle a flood of submissions, these journals operate a triage system, where the editor determines whether the paper should go out for review. This can have the benefit that rejection is rapid, but it puts a lot of power in the hands of editors, who are unlikely to be specialists in the subject area of the paper, and in some cases will explicit in their preference for papers with a 'wow' factor. It also means that one gets no useful feedback from reviewers: viz my recent experience with the New England Journal of Medicine, where I submitted a paper that I thought had all the features they'd find attractive – originality, clinical relevance and a link between genes and behaviour. It was bounced without review, and I emailed, not to appeal, but just to ask if I could have a bit more information about the criteria on which they based their rejection. I was told that they could not give me any feedback as they had not sent it out for review.
3. If the paper does go out for review, the subsequent review process can be very slow. There's an account of the trials and tribulations of dealing with Nature and Science which makes for depressing reading. Slow reviewing is clearly not a problem restricted to high impact journals. My experience is that lower-impact journals can be even worse. But the impression from the comments on FemaleScienceProfessor's blog, is that reviewers can be unduly picky when the stakes are high.
So what can be done? I'd like to see us return to a time when the purpose of publishing was to communicate, and the purpose of research funding was to enable a scientist to pursue interesting ideas. The current methods of evaluation have encouraged an unstoppable tide of publications and grant proposals, many of which are poor quality. Many scientists are spending time on writing doomed proposals and papers when they would be better off engaging in research and scholarship in a less frenetic and more considered manner. But they won't do that so long as the pressures are on them to bring in grants and generate publications. I'll conclude with a few thoughts on how the system might be improved.
1. My first suggestions, regarding publications, are already adopted widely in the UK, but my impression is they may be less common elsewhere. Requiring applicants for jobs or fellowships to specify their five best publications rather than providing a full list rewards those who publish significant pieces of work, and punishes piecemeal publication. Use of the H-index as an evaluation metric rather than either number of publications or journal impact factor is another way to encourage people to focus on producing substantial papers rather than a flood of trivial pieces, as papers with low citations have no impact whatever on the H-index.There are downsides: we have the lag problem, which makes the H-index pretty useless for evaluating junior people, and in its current form the index does not take into account the contribution of authors, thereby encouraging multiple authorship, since anyone who can get their name on a highly-cited paper will boost their H-index, regardless of whether they are a main investigator or freeloader.
3. Instead of using metrics based on grant income, those doing evaluations should use those based on efficiency, i.e. an input/output function. Two problems here: the lag in output is considerable, and the best metric for measuring output is unclear. The lag means it would be necessary to rely on track record, which can be problematic for those starting out in the field. Nevertheless, a move in this direction would at least encourage applicants and funders to think more about value for money, rather than maximising the size of a grant – a trend that has been exacerbated by Full Economic Costing (don't get me started on that). And it might make grant-holders and their bosses see the value of putting less time and energy into writing multiple proposals and more into getting a project done well, so that it will generate good outputs on a reasonable time scale.
4. The most radical suggestion is that we abandon formal institutional rankings (i.e. the successor to the RAE, the REF). I've been asking colleagues who were around before the RAE, what they think it achieved. The general view was that the first ever RAE was a useful exercise that exposed weaknesses in insitutions and individuals and got everyone to sharpen up their act. But the costs of subsequent RAEs (especially in terms of time) have not been justified by any benefit. I remember a speech given by Prof Colin Blakemore at the British Association for the Advancement of Science some years ago where he made this point, arguing that rankings changed rather little after the first exercise, and certainly not enough to justify the mammoth bureaucratic task involved. When I talk to people who have not known life without an RAE, they find it hard to imagine such a thing, but nobody has put forward a good argument that has convinced me it should be retained. I'd be interested to see what others think.