Comments on BishopBlog: Metricophobia among academics

2015-05-29T17:23:23.146+01:00

This comment has been removed by the author.

Entering into this late but many excellent points ...

2014-12-12T16:55:43.204+00:00

Entering into this late but many excellent points have been made and this is instructive to all (scientists and, hopefully, the mandarins that seem think that science assessment exercises are infallible). I have only two things to add:

1. DC notes that actually reading the papers is the most direct means of assessing quality. This is routinely dismissed as a gargantuan task. However, it can be made far more manageable (and interesting for the scientific assessor) if the applicants are asked to limit submissions to what they consider their 5 most significant papers (could be less) and why. In Canada, CIHR used to do this as part of their CV. In their infinite wisdom, they just removed it. As a reviewer, I found the most significant publ. section far more valuable that the detailing of their full "productivity".

2. Someone should ask whether the RAE, etc. are negatively impacting the quality of the science they profess to improve. I can think of several reasons why this may be the case. Firstly, the time and effort consumed is significant. Secondly, it is changing behaviour (and encouraging different behaviour). Is this positive or negative? It is certainly infectious. Thirdly, we, as scientists, are at least partly to blame. We have blithely bargained for more resources to expand our enterprise on the promise of wonderful outcomes. We are being called on it. The return on investment in high quality science is inversely proportional to the amount of time to impact. We are discouraging longer term thinking and acting like professional football clubs, wheeling and dealing players to prop up gate receipts, but without the benefit of tallying balls in the net after 90 mins.

I am not for one second suggesting that the curren...

2014-12-01T21:55:57.159+00:00

I am not for one second suggesting that the current REF system is not flawed. But the problems with focusing on H-index to my mind clearly outweigh the issues with other strategies because, as I said in the tweet to which Dorothy refers in her post, it lends credence to the concept of the H-index itself.

I did a session today for the "Politics, Perception and Philosophy of Physics" Year 4 MSci module I teach on the subject of p-values (lots of mentions of David Colquhoun's blog posts on this topic!). There are very interesting parallels between the p-value problem and the use of H-indices in that both attempt to reduce a complex multi-faceted dataset to a single number.

The P-value concept has been exceptionally damaging to science [R. Nuzzo, Nature 506 150 (2014)]. Let's not start similarly lending the H-index credibility it doesn't deserve.

This has been an interesting discussion. Hirsch&...

2014-12-01T15:26:47.820+00:00

This has been an interesting discussion.

Hirsch's 2005 PNAS is entitled An index to quantify an individual's scientific research output. Scaling up from the individual is fraught with difficulty as well as with opportunity for manipulation to produce a desired result. H-indices for addresses, as Dorothy suggests, would undoubtedly produce a flurry of renaming of departments and even institutions. There seems to be no end to the lengths to which managers will go in adjusting the appearance, rather than the reality, of research achievement.

In contrast, a REF rating, whatever it means, is a property of an ensemble of individuals. In some REF dry runs, individuals have nevertheless been allocated personal REF ratings. The values were obtained by undisclosed means, and by persons who remained anonymous – at least to those who were being rated. The question was then asked, by one's local managers "Well, what are you going to do to increase your rating?".

If it were possible to answer that question, it would devalue the whole exercise, since REF ratings would then, at least in part, be a measure of the effectiveness of REF tactics rather than research quality.

I honestly recommend that REF is scrapped completely. Even if an alternative system of evaluation could be agreed, it would only report on past achievement.

Research is discovery, which is unpredictable. The more important the discovery, the more is it unpredictable.

I've listed what I consider to be six serious flaws in REF on Research Assessment and REF | John F. Allen's Blog.

I am sure there are many others. Why go to such lengths to devise and implement a system that purports to measure the unmeasurable?

While Philip Moriarty thinks an example such as Do...

2014-12-01T12:48:58.044+00:00

While Philip Moriarty thinks an example such as Don Eigler indicates that a metric such as the H index won't work, it would appear much less simple. An academic who produces outstanding papers every couple of years or could be a less than stellar contributor to the current system - they might not have the requisite 4 papers in the REF period. So the metric does no worse / no different than the current arrangements.
Since no-one is claiming they have the perfect assessment system, pointing out anomalies or injustices in one approach is not so helpful, unless it is clear that these in some tangible sense outweigh the problems in the other. Let alone that the current system can justify the resource costs it consumes given what it delivers, as pointed out by others.

I don't disagree with any of these points. Alt...

2014-11-30T19:31:35.502+00:00

I don't disagree with any of these points. Although having benefitted fmyself rom moving in the REF transfer window, I think that the fact that REF outputs are portable has been good for academics and our salaries. I know other people might not agree.

My concern about using number of returnees to allocate funds is not so much about the REF-ability of the Departmental cat (I agree we could probably nix that aspect of gaming the system). I just worry that the Department of Physics (or whatever) will suddenly become huge for the purposes of REF whereas smaller departments would be closed down. OTOH, at my previous University I saw a very small Dept get a good RAE score in an under-populated UoA. This then resulted in a huge new building and permission to double the number of academic staff. All on the basis of a (probably erroneous) RAE performance.

OK I take your point that you looked at only two s...

2014-11-30T14:36:12.124+00:00

OK I take your point that you looked at only two subject areas. If bibliometricians were doing their job it is the sort of thing that they should be doing. I hope the fact that they aren't doing it is not because your results so far suggest that H-index contributes very little and that threatens to put them out of business. It would be much cheaper if counting the size of departments could be substituted for employing/buying bibliometrics,

Perhaps this is a job the HEFCE should be doing, if bibliometricians won't (attention: James Wilsdon)

I like your take on this Dominik. I know you have ...

2014-11-30T10:46:09.860+00:00

I like your take on this Dominik. I know you have commented on my REF-related posts before, but I think this perspective is well worth airing.

A bigger aspect of the problem is that fundamental...

2014-11-30T10:24:24.735+00:00

A bigger aspect of the problem is that fundamental issue in all science of generalising from individuals to populations or in reverse, controlling populations (wholes) by controlling individuals (parts).

And in the sort of sciences that can point to engineering successes for their epistemic legitimacy (even if the engineering often came first), the daily experience is that of what we know about individuals fairly straightforwardly scales up to populations (allowing for explainable size and interaction effects). So, if I know how individual atoms release energy, I get nuclear fission when millions of atoms release energy in fractions of a second. If I know what happens when gasoline explodes once in a closed chamber, I get a combustion engine with hundreds of explosions per second. The same with a computer. If I know, how to change the state of a piece of silicon from 1 to 0, I know how to do anything at that I can translate into 1s and 0s even if I do that a million times a second.

Setting aside the fact, that even in the hard sciences this is mostly an illusion because things quickly break down at bigger or smaller scales, this relationship of individuals to populations is much more tenuous in the social and psychological realm.

Properties of individuals have beguiling similarities to properties of populations but are often completely structurally different. For example, Rational Choice Theory proved to be a very useful model for certain populational economic behaviours. Unfortunately, it did not actually describe how any one individual made choices. It was successful as long as we did not care about what every individual actually did. But when the behaviour of individuals became important, it broke down.

Readability metrics are another example, they could rely on completely mechanistic measures that didn't even require understanding the text to make predictions about populations (proportion of results of a large number of readers reading a large number of texts). But they are much less useful for judging the readability of any one text. And they are almost completely useless at judging how difficult any one text will be for any one individual. Learning styles is a similar story.

The REF is a great example. The task is to deal with populations. Distribute a discrete sum across a population. Therefore, a population type measure would be most successful. But the way the task is approached is through assessing the individuals as complete beings. Which means that the measures used cannot work because such in-depth peer review assessment will always produce fundamentally incommensurate results (kind of like Ofsted inspections).

The paradox is that what is fair to all individuals may turn out to be an unfair system which satisfies no one. All that REF running around will simply have to be reduced to a simple numerical formula. So it would be much smarter to deal with the allocation as a numerical task to start with using a primary proxy measure (size of department sounds like a brilliant solution) but not to then translate that measure into quality. The job is to improve the quality of the system as a whole and any one individual's quality is only distantly related to it. So the REF is more similar to voodoo than the causal investigation it is styled as.

PS: As I was typing it, I got a strange sense of doing this before. I think I may have made a similar comment on another post here. If so, sorry for belabouring the point.

Thanks all. Just 3 more responses. 1. Just to clea...

2014-11-30T08:39:42.274+00:00

Thanks all. Just 3 more responses.
1. Just to clear up one point: the departmental H-index is NOT the same as the average H-index per department. It is the H-index you get if you search for papers by departmental address and then compute H-index based on the resulting publications. This is rather different because people may come and go: what matters is the research published from your address. If you try to parachute in a research star, it won't do you much good unless you commit to them long enough for them to publish from your institution and accrue citations. Conversely, if you fire an active researcher for nonproductivity you'd need to be pretty confident that they aren't going to go off and publish their work from a different address.
2.Re David' point: you would need to demonstrate that the departmental cat was research-active and on the payroll.
3. Cases like Don Eigler: again I reiterate that I am NOT saying H-index is anything like a perfect indicator of quality of individual researchers or even departments: I am just saying we need a measure for comparing departments that is good enough to act as a proxy scale when allocating income. I don't think there *is* a gold standard, so we are bound to find any measure inadequate. We need to ensure that the amount of time and money we spend on evaluation is not defeating the purpose of the exercise and damaging the ability of researchers to research.
One final thought: I have suggested elsewhere that University league tables should take into account staff satisfaction. http://cdbu.org.uk/staff-satisfaction-is-as-important/ Perhaps if we found a way to include that in the funding metric it would address concerns about adverse effects on science from gaming?

Thanks for posting the link to the paper on the H-...

2014-11-29T20:44:01.002+00:00

Thanks for posting the link to the paper on the H-index as a combinatorial Fermi problem, Anon -- much obliged. It's a really intriguing analysis.

Philip

Thanks, David. The paper cited by Anonymous in the...

2014-11-29T20:35:58.216+00:00

Thanks, David. The paper cited by Anonymous in their comment below is worth reading. It's an intriguing analysis which is closer to the type of approximate mathematical approach physicists often use than the level of rigour usually associated with pure maths. I've only skim-read it as yet but it's a very interesting analysis.

Just seen Anonymous' comment below. It must ha...

2014-11-29T20:32:54.073+00:00

Just seen Anonymous' comment below. It must have been posted while I was in the middle of writing the missive above. Their point about H-index scaling with output number is v. important and supports the "Eigler-centric" argument I made.

"Recent analysis of h-indeces in mathematics and physical sciences suggests that they simply scale with the number of outputs (a combinatorial Fermi problem - details here http://www.ams.org/notices/201409/rnoti-p1040.pdf)"

Philip. You are doing exactly what I advocate. Ta...

2014-11-29T20:15:29.751+00:00

Philip. You are doing exactly what I advocate. Take a paper (or person) that everyone agrees is good, and see how they perform. That's not what bibliometricians do, perhaps because the results tend to show how useless metrics really are.

Exactly. The point of my first comment was, I thin...

2014-11-29T20:11:48.463+00:00

Exactly. The point of my first comment was, I think, that Dorothy's data means the opposite of what she suggested. It shows that the H-index has little prognostic value, in that it adds little to what you can predict by counting the number of returns.

Thanks for the comprehensive response, Dorothy. I&...

2014-11-29T19:59:21.678+00:00

Thanks for the comprehensive response, Dorothy. I'd like to address a few points.

"The thing about H-index pressure is that it isn’t the same as pressure to publish."

I disagree entirely -- it's most definitely pressure to publish! I am going to use my example of Don Eigler again (see http://en.wikipedia.org/wiki/Don_Eigler). Eigler and his group at IBM did science exactly the way science should be done -- carefully, rigorously, and addressing key challenges. Eigler's group would produce a ground-breaking paper which would be heavily cited, and then "go off the radar" for a couple of years until they then "re-appeared" with another inspiring advance.

But according to Web of Science, Eigler's h-index is 24. Yet if I were to select the most important scientist in the sub-field in which I work, Eigler would be right at the top of the list.

Why is his H-index so low? Well, it's very simple -- he didn't publish "enough" papers (from the perspective of upping his H-index). What he did publish was exceptionally well-cited, but quality alone is not enough for H-index -- quantity is important as well. Thus, there is no question that a focus on H-index will produce a pressure to publish.

>>"departmental H-index".

I don't see how focusing on a departmental H-index is going to relieve the pressure on individual academics to increase their individual H-index. If the metric is "funding is based on average H-index of department" then all staff will be pressured to increase their H-index. I am 100% certain that this would be the case in Nottingham and, from my reading of the Times Higher every week, I don't see strong evidence across the UK HE sector that Nottingham is an outlier with regard to chasing simplistic metrics in a simple-minded way! This is what I meant by not giving credence to the H-index.

>>"Subject-specific variations in Hindex are important if you are comparing across disciplines, but that is not the case here"

Again, I fundamentally disagree. Comparing across sub-disciplines (and sub-sub-disciplines) is a *major* issue. The citation behaviour in condensed matter physics and particle physics, as just one example, is very, very different indeed.

>>"But I am concerned with the departmental H-index in >>aggregate".

But the aggregate departmental H-index depends on the H-indices of individual members of staff. Sorry to bang on about this, but you know as well as I would what would happen if funding is based on aggregate H-index! Just as we now have for Student Evaluation of Teaching scores and World Rankings, university managers would blindly compare average H-index scores across departments and faculties (down to four or five (in)significant figures!) with no attention paid to variations in citation behaviour. (And even if they were to pay attention to those sub-discipline dependencies, just how would they credibly normalise them out? I shudder to think of the type of multi-parameter functionals that bibliometricians would dream up...)

>>"H-index may be more attractive to the bureacrats for that reason: If you divide department Hindex by dept size you have a measure of mean ‘quality’ that allows people to compile league tables."

This is *exactly* the problem. Why are we pandering to "bureaucrats"? H-index gives the illusion of tracking quality but for all of the reasons discussed above, that's all it is -- an illusion.

A chilling moment for me was when the ex-head of m...

2014-11-29T19:34:37.660+00:00

A chilling moment for me was when the ex-head of my ex-department told me that he had reanalysed the 2008RAE results and found that an identical outcome was found by just taking the *impact factor* of the journal that outputs were published in and assigning the scores only on this basis.
Since the JIF is the most spurious of research metrics, I really liked your original post suggesting that the departmental H-index could be used instead. I don't really like the idea of metrics used in assessment either, but since they are (sub-consciously or consciously by the panel in the case of JIF) and since the alternative of actually reading and assessing papers is not plausible, then this seemed like a good idea and would save a lot of time and hassle. My own department hired several FTEs to handle our REF submission: madness! There is one problem however...
Recent analysis of h-indeces in mathematics and physical sciences suggests that they simply scale with the number of outputs (a combinatorial Fermi problem - details here http://www.ams.org/notices/201409/rnoti-p1040.pdf). If this is true outside of Maths and Physics, then the largest departments or the ones that return the most academics in a REF assessment will always come out top.
Dorothy's last post suggested that using staff number as a proxy could be a shortcut to research assessment and save us all time, and DC has responded that the departmental cat would be returned if this were the case. My point is that the departmental H-index idea and the number of returnees idea may well be one and the same.

Lets just use IQ tests in REF. https://acadenema.w...

2014-11-29T10:31:33.268+00:00

Lets just use IQ tests in REF. https://acadenema.wordpress.com/2012/12/11/lets-use-iq-tests-in-ref/

Many thanks for your responses. Philip & ferni...

2014-11-29T08:59:04.935+00:00

Many thanks for your responses. Philip & ferniglab: I’m sorry your initial comments fell victim to Blogger’s primitive automated system for spam detection. I found Philip’s comment in spam and have reinstated it.

The thing about H-index pressure is that it isn’t the same as pressure to publish. It is pressure to publish work that will be highly cited – or indeed to be involved in fostering such work (given that we are talking about departmental level). It should discourage people publishing loads of papers. Things like self-citation can be easily discounted. I’m not saying there’d be no pressure, but I am not sure it would be worse that what we have already. I would dispute that the departmental H-index is ‘easily gamed’. Eg you can game your individual H-index by spurious authorship: this would do the departmental H-index no good if all authors were from your institution.

Subject-specific variations in Hindex are important if you are comparing across disciplines, but that is not the case here. I accept we could have problems in disciplines with a mix of types: in psychology, neuroscience tends to get higher citations than other types, and I’d worry about everyone moving to neuroscience. Of course, that pressure is already there because that is where the big grant money is. But neuroscience is also more expensive to do than other kinds of psychology, so you could argue that it is right that departments with lots of neuroscience get more core funding. Not sure if it works that way in other disciplines – ie more expensive also tends to be more cited. Would be interesting to have evidence on that point.

Several people have argued that H-index is flawed by drawing attention to instances where bad work gets a high H-index. But I am concerned with the departmental H-index in aggregate. You can put up with a bit of slop in the system if it averages out when you have big numbers. Whether or not that is so is an empirical question which is addressed by studies such as Mryglod et al.

I’d share concerns about H-index being used to rate individuals, except that would not be sensible if we are talking at the departmental level, where a highly influential paper is likely to be the result of several people collaborating – some of whom will be juniors whose personal H-index is small.

Also an empirical question is whether H-index adds anything over departmental size (shorthand for N research active). David says it doesn’t on the basis of very slender evidence: I looked at 2 subjects pretty informally and showed that H-index explained a small amount of extra variance over dept size in psychology and none in physics. He may be right, that overall it’s negligible, but we need more data.

Like Kieron, I am all for simplification, but I think sole reliance on REF would be terrible! It would put far too much power in the hands of a small group of people.
I’d also worry about just giving money to research councils. Then the pressure would be on everyone to get expensive grants – and I see that as even more pernicious and damaging to science than pressure to publish.

As I’ve argued before, I’d be happy with funds allocation in relation to N research-active staff – measured over the whole period to avoid people being parachuted in at the last moment. I think that is unlikely to be acceptable because people want to measure that elusive thing ‘quality’. H-index may be more attractive to the bureacrats for that reason: If you divide department Hindex by dept size you have a measure of mean ‘quality’ that allows people to compile league tables. Since I don’t like league tables, I don’t regard that as a good thing, but it is a consideration. So I agree that there is more to ‘quality’ than an H-index. However, whatever it is, it’s not clear to me that the REF does a better job of detecting it. Both are proxy indicators, but one is far more efficient than the other, so if they give essentially the same outcome, it is the least bad option.

The first line in the abstract of your paper assum...

2014-11-29T00:34:15.378+00:00

The first line in the abstract of your paper assumes that quality and citations go hand in hand. How do you justify that assumption?

See, for example, final paragraph of this: http://blogs.lse.ac.uk/impactofsocialsciences/2014/11/19/peer-review-metrics-ref-rank-hypocrisies-sayer/

"The conclusion I would rather draw, however, is that peer review vs. metrics is in many ways not the issue. Neither is capable of measuring research quality as such—whatever that may be. Peer review measures conformity to disciplinary expectations and bibliometrics measure how much a given output has registered on other academics’ horizons, either of which might be an indicator of quality but neither of which has to be."

One of my comments also got lost in the ether, Dav...

2014-11-29T00:08:53.709+00:00

One of my comments also got lost in the ether, Dave. I assume it's in the moderation queue (too many URLs, maybe).

My first comment kicked off as follows:

"First, and like David Colquhoun has suggested previously, Kieron's suggestion of doing away with the dual support system has a heck of a lot to recommend it. But (and despite my criticisms of the research councils (RCs) over the years!) I would not abolish the RCs. Rather I would abolish the REF and transfer the funding to the RCs. The RCUK peer review system should be changed, however, so that impact is judged at the *end* of a grant (and in subsequent years) rather than nonsensically being appraised before the grant starts."

Previous effort seems to have got lost in the ethe...

2014-11-28T23:50:32.389+00:00

Previous effort seems to have got lost in the ether...

There is a correlation, at least for departments over a certain size between RAE/REF ranking and ranking by h-index.

Metrics are flawed.

REF evaluation of papers is flawed because papers cannot be read, there are far too many of them.

Given the above then we could follow Kieron Flanagan's advice, to which I would add that, for high impact science, the UK has had the model: the LMB. Simple rule, no more than 5 people per PI (technicians, PhD students and postdocs). So dish out the cash per person in each PI's group to a ceiling of 5.

Alternatively, since both processes are flawed, then we have a choice: throw away our time on process #1 or live with process #2 and have considerable more time to devote to teaching and research. As someone involved in REF enjoying a post REF renaissance, I would take ANY process that didn't consume several years of my career, even if it meant losing resource. Time is the most precious of resources,yet we seem happy to chuck it in the bin.

I think economists also like metrics - mainly beca...

2014-11-28T23:02:05.426+00:00

I think economists also like metrics - mainly because we think peer reviewing already peer reviewed publications is a waste of resources, but also because we are aware like psychologists that social science measurement is going to be imperfect. I just published a paper in PLOS ONE, advocating using metrics (http://dx.doi.org/10.1371/journal.pone.0112520) and we are working on another one which I think will be more convincing.

#2 of 2. [Apologies for having to split my respon...

2014-11-28T20:24:00.043+00:00

#2 of 2.

[Apologies for having to split my response in two. I hit the 4096 character limit for comments!]

-- 'Odd' idea #2. I don't see how it's 'odd' to suggest that a H-index based system is highly susceptible to gaming. And I don't see at all how a H-index-based system is somehow likely to be less susceptible than the current REF system (which, of course, is far from ideal). See, for example, this: http://scholarlykitchen.sspnet.org/2012/04/10/emergence-of-a-citation-cartel/

and this: http://blogs.nature.com/news/2012/06/record-number-of-journals-banned-for-boosting-impact-factor-with-self-citations.html

and this: http://www.timeshighereducation.co.uk/news/journal-citation-cartels-on-the-rise/2005009.article

-- 'Odd' idea #3. I'm also an empirical scientist. But I also know that we shouldn't attempt to quantify the unquantifiable, and we should be very careful that a measurement isn't (i) so invasive that it distorts the system we're measuring, and (ii) is actually a good representation of the quantity we're trying to determine.

You said: "My approach to the REF is the same as my approach to the rest of my work: try to work with measures that are detailed and complex enough to be valid for their intended purpose, but no more so."

The H-index is a single number which is easily gamed; effectively impossible to normalise across narrow sub-fields (let alone entire disciplines); often a questionable indicator of research quality; and a quantity which disadvantages early career researchers. If you think that this is "detailed and complex" enough to "be valid for its intended purpose" then I guess we'll just have to agree to disagree.

>>"To work out whether a measure fits that bill, we need to do empirical studies comparing different approaches – not just rely on our gut reaction."

I think it's rather unfair to argue that those who are critical of using a simplistic metric like the H-index to assess staff are "arguing from the gut". (And make no mistake, if the H-index was adopted as a mechanism for allocating QR funding, every academic in the country will be under pressure to increase their H-index).

I will stress again that my H-index is higher than that of *the* leader in my research field -- a scientist who has been responsible for some of the most elegant, inspiring, and, errmmm, heavily cited research in nanoscience. [See https://twitter.com/Moriarty2112/status/537894547020607488 ].

This observation alone is enough, in my view, to discredit the entire H-index concept!

#1 of 2. First, and like David Colquhoun has sugg...

2014-11-28T20:21:28.178+00:00

#1 of 2.

First, and like David Colquhoun has suggested previously, Kieron's suggestion of doing away with the dual support system has a heck of a lot to recommend it. But (and despite my criticisms of the research councils (RCs) over the years!) I would not abolish the RCs. Rather I would abolish the REF and transfer the funding to the RCs. The RCUK peer review system should be changed, however, so that impact is judged at the *end* of a grant (and in subsequent years) rather than nonsensically being appraised before the grant starts.

This seems to be the 'lesser of two evils' to me but I realise that there's a broad spectrum of opinions on this issue!

David's already handled the key points in his pithy comment above but I'd like to address your three "odd ideas".

-- 'Odd' idea #1. The THE article trumpets that H-indices can potentially be used to predict REF results. In what sense is that article *not* giving credence to the idea that the H-index is a metric that university managers/PVCs should monitor? *You* may appreciate the subtleties. I can guarantee that many managers and PVCs will certainly not see beyond the "headline" figure. (Look at how universities exploit highly suspect world rankings at the moment). I already know of cases where lectureship applications have been sifted by H-index. The article in the THE is certainly not going to discourage this type of behaviour.

I simply don't see how the suggestion that the THE article is bolstering the concept of "H-index-for-quality-assessment" is an 'odd' idea. You are helping to strengthen the perception of the H-index as a 'reliable' indicator of quality. Would you really want the H-index to be used for staff appraisal? You laudably criticised KCL severely for its use of a simple-minded metric (grant income) as a mechanism for staff assessment. The H-index is an equally simplistic and flawed metric.

There's a very good comment from David Riley below the line of the article at the THE website. Here's his first point:

"The evidence of the link between citations and quality as far as I am aware largely comes from comparing RAE/REF outcomes to citations. To what extent did the panel members use citations to help them decide on rankings? If they used them (whether officially or not) then this puts a question mark over the findings. A correlation would be inevitable regardless of the validity. "

This is a very important point. I would broaden the point still further and say that the key difficulty with the H-index is that it assumes that there is always a direct and positive relationship between citations and research quality. Citations do not necessarily measure research quality and I can point to many examples where this is not the case. Here's just three from my research field (nanoscience):

- http://physicsfocus.org/philip-moriarty-not-everything-that-counts-can-be-counted/

- http://physicsfocus.org/philip-moriarty-peer-review-cyber-bullies/

- and, most recently, a paper was published in Science claiming that hydrogen-bonds are directly observed in scanning probe microscope images. A paper subsequently published in Physical Review B convincingly and compellingly has shown that these features most likely arise from artefacts due to the probe itself. Guess which paper will pick up more citations...?

More generally, citations are a measure of popularity of a paper and this is a function of many variables, which need not include the level of scientific rigour. The potential for headline generation can often trump rigour in those "top tier" journals, as Randy Schekman has highlighted.