Scientists should be good at judging the importance of the scientific work of others - it's a peer review culture - but a new paper instead says that scientists are unreliable judges of the importance of fellow researchers' published papers.

They're better at it than you. But still pretty bad at it, according to the authors.

Prof. Eyre-Walker and Dr. Nina Stoletzki analyzed three methods of assessing published studies, using two sets of peer-reviewed articles. The three assessment methods the researchers looked at were:

  • Peer review: subjective post-publication peer review where other scientists give their opinion of a published work;
  • Number of citations: the number of times a paper is referenced as a recognized source of information in another publication; 
  • Impact factor: a measure of a journal's importance, determined by the average number of times papers in a journal are cited by other scientific papers.

Their findings were that scientists are unreliable judges of the importance of a scientific publication: they rarely agree on the importance of a particular paper and are strongly influenced by where the paper is published, overrating science published in high-profile scientific journals.

They also found that the number of times a paper is subsequently referred to by other scientists bears little relation to the underlying merit of the science. 

Correlations within journals with 100 or more papers in the F1000 dataset. 

Eyre-Walker , from the University of Sussex, said, "The three measures of scientific merit considered here are poor; in particular subjective assessments are an error-prone, biased and expensive method by which to assess merit. While the impact factor may be the most satisfactory of the methods considered, since it is a form of prepublication review, it is likely to be a poor measure of merit, since it depends on subjective assessment."

The authors argue that their findings could have major implications for any future assessment of scientific output, such as currently being carried out for the UK Government's forthcoming Research Excellence Framework (REF). Eyre-Walker adds, "The quality of the assessments generated during the REF is likely to be very poor, and calls into question whether the REF in its current format is a suitable method to assess scientific output."

Prof. Jonathan Eisen of the University of California, Davis and former editor of PLoS Biology, and Drs Catriona MacCallum and Cameron Neylon from the Advocacy department at PLoS, the open access company that owns PLoS Biology, wrote in an accompanying editorial that it is "among the first to provide a quantitative assessment of the reliability of evaluating research."

They also support the call for more openness in research assessment processes but caution that assessment of merit is intrinsically a complex and subjective process, with "merit" itself meaning different things to different people, and point out that Eyre-Walker and Stoletski's study "purposely avoids defining what merit is.".

They suggest that while impact factor is the "least bad" form of assessment, the use of multiple metrics that appraise the article rather than the journal ("a suite of article level metrics") should be used. Such metrics might include "number of views, researcher bookmarking, social media discussions, mentions in the popular press, or the actual outcomes of the work (e.g. for practice and policy)."

Citation: Eyre-Walker A, Stoletzki N (2013) The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations. PLoS Biol 11(10): e1001675. doi:10.1371/journal.pbio.1001675