Social & Policy Sciences

Blogging about research, activities and impact in Social & Policy Sciences

A Modest Defense of Citation Metrics


📥  Impact, Opinion

Graham K. Brown
Associate Dean (Research)
Faculty of Humanities & Social Sciences
University of Bath

The ongoing review and consultation by HEFCE of the use of metrics in the REF process has reinvigorated a wider debate over research metrics more generally. A thorough critique authored by Meera Sabaratnam and Paul Kirby has set the academic blogosphere alight. They put forward three main arguments:

  • Metrics do not measure what they purport to and artificially conflate research ‘quality’ with ‘impact’;
  • Metrics systematically bias against certain groups, notably women and ethnic minorities; and,
  • Metrics are easily ‘gamed’.

These are important points, but I would suggest that they constitute largely a critique of academia as a whole, not of metrics per se. We can all, I’m sure, name controversial and even bad scholars who gained career success and influence on the basis of the controversial rather than high-quality nature of their research, long before research metrics were on the scene. Sabaratnam and Kirby point to Huntingdon’s notorious Clash of Civilizations as an example of a much-referenced, often ridiculed work, but this conflation of quality with impact long precedes the use of metrics. The gender bias in academic appointments, promotions and salary is well-established. And ‘playing the game’ in academia – getting on the right committees to get promotion, building patron-client networks with former PhD students, and just plain sucking up – is, again, a not uncommon behaviour.

Average salary UK University


It may be that metrics will exacerbate these problems. But, conversely, I would suggest that research metrics (including citation metrics) can be useful in exposing such problems and forcing us as a community to confront them. In this sense, there is an irony in the fact that Sabaratnam and Keely use a quantitative study based on citation metrics to substantiate their claim that women will be disadvantaged by metrics: without these metrics, that very point would be harder to make, but the bias no less real. Likewise, the ‘Matthew Effect’ that sees more productive scientists (in terms of quantity) receive proportionately more credit for their discoveries than less productive scientists was first theorized by Robert Merton in 1968, but it was only in the past twenty years or so that data on citations were robust enough to demonstrate it empirically. Larivière and Gingras have a useful discussion of this literature as well as their own demonstration of the impact of journal ranking on subsequent citations.

This is not to say that our existing metrics are without problem, but just as their use exposes problems and challenges within academia, so this in turn can help us modify and improve indices to provide better metrics of high-quality research.

Beyond Sabaratnam and Kirby’s critique, other criticisms of metrics are also well known. For instance, journal metrics may be more problematic for some disciplines within Humanities and Social Science than it is for other disciplines, largely because of the lack of consensus over what constitutes an appropriate qualitative rather than quantitative ranking of journal quality. The social sciences are characterized by a much wider degree of methodological and epistemological pluralism than the STEMM subjects, and journal rankings based on citation counts tend to privilege dominant approaches.

To give one concrete example, in political science, journal impact factors tend to produce the highest ranks for quantitative journal because political science in the US is overwhelmingly quantitative. Even the best qualitative research in the world may thus appear less high quality than perhaps mediocre quantitative research. The risk of naïve metrics is that both individually and institutional (in terms of appointment), this creates incentives for conforming ever closer to mainstream approaches rather than engaging with alternative or critical methods and epistemologies. But this, I would suggest, is an argument for improving our metrics rather than abandoning them.

As above, my point is that the trend toward methodological and theoretical uniformity in some disciplines is a long-existing phenomenon, not one created by metrics. Lee Smolin’s popular account of string theory in The Trouble with Physics contains an insightful sociological analysis of how this (in his view misguided) theory came to dominate theoretical physics in US research institutions long before the rise of metrics; the hegemony of rational choice approaches in economics is even older. While it may be too late for economics to recover its methodological pluralism, disaggregated metrics such as those available in SciVal may, for instance, help both expose the extent of the drift towards similar methodological uniformity in political science and enable qualitative researchers and institutions to demonstrate their impact compared with other qualitative researchers.

So how, then, would I propose research metrics be used?

We all use metrics all the time in our professional and personal lives. The following story is probably not unfamiliar to many. I have just accepted a position at the University of Western Australia. When I was first approached about the job, I did not know the institution at all, but a quick check of the QS Rankings convinced me to look at them seriously. At around the same time, I was approached by another institution; their ranking – somewhere in the doldrums where QS doesn’t even give a specific rank – prompted a quick and polite rebuttal. In my letter of application to UWA, I made use of my Google h-index and was able to compare it favourably with the LSE study of h-indices across the social sciences. After being offered the position, one major concern I had was about schooling for my son. Again, a quick consultation of The Australian’s school league tables convinced me that there were plenty of good schools in which to indoctrinate him into the metrics of the future.

Of course, I didn’t solely base my assessment of UWA on their QS ranking, neither will I pick a school for my son purely on the basis of the metrics. Likewise, I doubt very much that my wanton waving of an h-index influenced the selection committee over and above my detailed academic resumé, job presentation and interview. But they were useful bits of information that facilitated (but didn’t constrain) comparison and aided decision-making.

This is the approach that I think we should take towards metrics in the REF and in academia more generally. Metrics allow us display our strengths, but as part of an overall narrative of research excellence, not as a single indicator of ‘impact’. Metrics can and should be a useful part of the REF process, but they should be seen as a range of evidence bases that can be used to substantiate our overall case for research excellence, rather than a prescribed evaluative framework. As institutions and as individuals, we should be able to choose which metrics to use and emphasize as part of an overall narrative about research excellence.

‘My’ version of the REF does include publicly available databases of research metrics, but just as The Australian warns that its schools data is a ‘guide only and not intended for ranking schools’, so I think that research metrics should be publicly available but, in the REF and elsewhere, used as a source of data to aid in promoting research excellence, not to narrowly and programmatically constrain it.

While there are certainly potential problems and perverse incentives in the application of metrics, the debate has focused too much on these and not on the ways in which metrics can help expose existing problems and perverse incentives in the academy. If they are used carefully and reflexively, in tandem with a qualitative account of research excellence, I see no reason why they could not play a more positive role that outweighs their drawbacks.

2 Responses to “A Modest Defense of Citation Metrics”

  1. Paul Kirby on

    Thanks for the link to our piece, and happy to see our analysis mainly supported! Two brief points.

    First, it is not so ironic that we rely on a quantitative study, since our argument is not against quantitative methods, nor against metrics per se. We indeed stress that they might be useful as a tool for answering specific questions. But the HEFCE consultation is primarily about using metrics as a proxy for "quality", and our focus was consequently on that link. It may well be that for certain positions, an institution like UWE is looking for a candidate with a certain extra-academic profile, or one who crosses academic boundaries with their work. Metrics might well be a useful measure of that. This is quite different from saying that HEFCE should expand metrics as a generalised measure of quality across disciplines and departments when it comes to the distribution of research funding.

    Second, you suggest that metrics can be useful if not used "naively". We would be minded to agree, but what are the prospects of an institutionalised system of evaluation-through-metrics being used responsibly? Many have pointed out, in peer review journals as well as blog posts, that the Impact Factor of a journal is a bad proxy for the quality of individual papers within it. And yet hiring committees and others continue to think of the IF as a good guide to work quality. Why would we expect any more from readings of the h-index? In this sense, it is not metric advocates who are the problem, so much as the everyday academic practices that will develop around any new measures. So, for example, Cameron Neylon and PLOS's submission submission to the consultation is clear, subtle and convincing in its call for nuanced metric use, but it is hard to see how this could be formalised into a REF process (nor does Cameron suggest it should be).

    None of this is to say that either peer review or REF panels are a cost-effective or accurate system for quality assessment (assuming we could even agree on a singular standard of "quality"). But at the very least the subjective nature of those assessments is obvious to all involved, and can be defended as inevitable given the plurality of knowledge positions, etc. The danger is of a system that appears to offer calculability, comparability and mathematical truth about quality and impact, whilst doing very little of that in practice.


Leave a Reply

  • (will not be published)