An introduction to responsible metrics (Open Access Week)

the open access symbolThis post is part of a series of blogs in celebration of Open Access Week 2021. Keep an eye out for more posts throughout the week or follow our Twitter account, @OpenResPlym, to keep up to date with OA Week events.

Responsible metrics in a nutshell

Research metrics are used to ‘measure’ the influence or impact of researchers and their publications. Authors use journal metrics to decide where they want to publish; article metrics are used to assess the ‘quality’ of a research output, or group of outputs; institutions use author metrics to inform the recruitment, probation, or promotion of researchers.

Most research metrics come in the form of quantitative measurements. Many well-known metrics are citation-based, from citation counts and percentiles, Field-Weighted Citation Impact (FWCI), the h-index, and the Journal Impact Factor (JIF); beyond these, metrics might include anything from views, downloads, mentions, or sales, to collaboration metrics or research grant income.

Problems with the use of metrics arise when these quantitative metrics are used as a proxy for measuring something more complex and qualitative, such as the overall calibre of a researcher or a research output. Qualitative metrics can also be biased or manipulated, particularly if too much emphasis is placed on specific metrics as evaluation criteria.

Responsible metrics is a movement which advocates for the ethical, appropriate use of numerical metrics when evaluating research. The idea is not to do away with quantitative metrics, but rather to ensure that they are used in appropriate situations, applied alongside qualitative information wherever possible, and that they are not used as inadequate proxies or arbitrary measures.

 

What’s so bad about quantitative metrics?

Quantitative metrics are not inherently ‘bad’. The problems arise when these metrics are used badly.

Citation metrics, for example, measure exactly what they set out to measure (within the parameters of the available data). They indicate how many citations were received by an output or group of outputs within a certain period of time; this measurement may be weighted, expressed as a percentile, or calculated in relation to another metric. These data certainly tell a part of a research output’s story, and they can be useful. They do not, however, tell the whole story: a high volume of citations does not guarantee that a piece of research is of high quality, nor vice versa. It is therefore not the metrics that are the problem, but rather the assumptions which are made about what these metrics can sufficiently ‘measure’.

Some of the ways in which research metrics can be insufficient, biased, or misconstrued include:

Lack of context

There are many reasons publications get cited, and not all of them are good. Generally, citation metrics make no distinction between ‘good’ citations and neutral or negative ones. Similar problems can arise when using social media attention as an indicator of research quality.

False proxies

The Journal Impact Factor (JIF) measures the average citations per document in an entire journal over two years. It is unreasonable to use the JIF of the journal an article was published in as a surrogate for its quality as an individual publication – or even for its citation impact: papers published in high JIF journals are not guaranteed to be more highly cited.

Bias and gaming

There are many ways in which research metrics can be biased. Some speak to broader problems (for example, some studies have shown female authors are less likely to be cited than their male colleagues),[1] while other metrics can introduce bias when applied in the wrong situations (for example, the h-index disfavours younger researchers, and doesn’t account for author order). Metrics can be deliberately manipulated, too – publishers can game their impact factor through publishing only in particular areas, avoiding certain output types, or even through participating in citation coercion and ‘citation cartels’.[2]

Skewed incentives

Too much emphasis on one metric encourages goal displacement and distortion of behaviour. Researchers may feel the need to prioritise what they are being measured by (e.g. impact factor) over anything else (e.g. more suitable publishing venues and/or open access opportunities).

Suitability

Some metrics may be used as appropriate indicators in certain situations but are entirely inappropriate in others. The FWCI, for example, becomes less stable the smaller the sample size, so it is not a suitable metric for smaller groups of outputs.[3] It also takes time to stabilise (since citations accrue over time), so it is more inaccurate when applied to newer publications.

Other common problems with metrics can include database reliance (since different databases will produce varying results) and failure to account for variance in practise between different disciplines.
 

So, what’s the solution?

It is not practical to do away with the use of research metrics entirely, nor does the responsible metrics movement advocate for this. It is possible, however, to avoid certain practises that are more likely to discriminate, and to introduce measures which can help to offset some of the problems outlined above.

Some general rules for good practice in research assessment include:

  • Not judging research solely on the journal it was published in
  • Avoiding arbitrary measures and ‘false precision’ – uncertainty and error margins must be taken into account
  • Avoiding reliance on any one metric, and using qualitative measures in conjunction with quantitative metrics whenever possible

Some questions one might ask before applying research metrics could be:

  • What are the risks associated with the application of metrics in this situation? Am I using metrics to make an impactful decision (such as hiring or promotion), or for an activity less likely to have an impact on the entities under examination (such as studying publication patterns at a national or institutional level)?
  • Am I using this metric as a proxy for something else? What am I really trying to measure, and what can these metrics actually tell me?
  • Are the metrics I am using appropriate in this particular situation? Do I understand what it is they are measuring, and their limitations?
  • Am I using an appropriate range of metrics or other methods of analysis? How can I best ensure this assessment is well-rounded?

 

Responsible metrics manifestos and statements

There are four key documents associated with the responsible metrics movement. Each has its own set of principles, but all outline some of the ways in which researchers or institutions can work to use metrics more responsibly.

The documents are:

  1. DORA – the San Francisco Declaration on Research Assessment (2012)
  2. The Leiden Manifesto (2015)
  3. The Metric Tide Report (2015)
  4. The Hong Kong Principles (2019)

Many thousands of individuals, research institutions, scientific organisations, and funders alike have signed DORA or aligned themselves with the principles of the Leiden Manifesto, thereby committing themselves to adopt responsible practises as outlined by these documents.
 

Gaining momentum

Support from initiatives such as Plan S has recently given the responsible metrics movement some additional momentum. The UKRI Research Councils are all signatories to DORA, and the Wellcome Trust now expect Wellcome-funded organisations to publicly commit to responsible research evaluation as a condition of their grants.

The University of Plymouth is in the process of working towards an official policy on responsible metrics.
 

Useful links

 

[1] See for example: Paula Chatterjee and Rachel M. Werner, ‘Gender Disparity in Citations in High-Impact Journal Articles’, Jama Network Open (2021), <https://doi.org/10.1001%2Fjamanetworkopen.2021.14509> [accessed 25/10/2021]; Neven Caplar, Sandro Tacchella & Simon Birrer, ‘Quantitative evaluation of gender bias in astronomical publications from citation counts’, Nature Astronomy (2017), <https://doi.org/10.1038/s41550-017-0141> [accessed 25/10/2021]

[2] Allen W. Wilhite and Eric A. Fong, ‘Coercive Citation in Academic Publishing’, Science, 335 (2012), <https://doi.org/10.1126/science.1212540> [accessed 25/10/2021]

[3] Ian Rowlands, ‘SciVal’s Field weighted citation impact: Sample size matters!’, The Bibliomagician (2017), <https://thebibliomagician.wordpress.com/2017/05/11/scivals-field-weighted-citation-impact-sample-size-matters-2/> [accessed 25/10/2021]