图书馆杂志

图书馆杂志 ›› 2019, Vol. 38 ›› Issue (1): 83-90.

• 民国文献研究 • 上一篇    下一篇

基于“precision-recall”曲线分析的高被引论文识别研究

李 信  程齐凯    

  • 出版日期:2019-01-15 发布日期:2019-01-24
  • 作者简介:李 信 武汉大学信息管理学院,博士研究生。研究 方向:信息检索、语义计量、文本挖掘等。E-mail: lucian@whu.edu.cn 湖北武汉 430072 程齐凯 武汉大学信息管理学院,讲师,博士。研究 方向:信息检索、文本挖掘。 湖北武汉 430072

Research on the Recognition of Highly Cited Papers Based on “Precision-recall” Analysis

Li Xin, Cheng Qikai   

  • Online:2019-01-15 Published:2019-01-24

摘要: 文章首先对高被引论文识别的现状、问题进行梳理和分析,在此基础上,选取地球物理学、计算机与自动化、力学、图书情报学和药学5个学科的90本中文核心期刊在2004-2016年间刊载的448 749篇研究文献,将高被引论文识别问题转化为信息检索问题,利用文献下载量(DS)和期刊引用分数(JCS)两个指标对高被引论文进行识别,并引入新的观测视角——“precisionrecall”曲线,对识别效果进行分析和可视化。结果表明,“precision-recall”曲线可以较好地对指标的高被引论文识别能力进行直观反映;文献下载量和期刊引用分数均可作为高被引论文识别指标,且文献下载量的高被引论文识别能力优于期刊引用分数。

关键词: 高被引论文“precision-recall”曲线, 文献下载量, 期刊引用分数, 补充指标

Abstract: This article first summarized the status and problems existing in identifying highly citedpapers, on the basis of which, we assumed that the number of download could be an indicator for identifying highly cited papers. To test the hypothesis, we manually collected 448 749 articles published in 90 core journals between 2004-2016 from the fields of geophysics, computers and automation, mechanics, library and information science, and pharmacy. We depicted the density of downloads and citations of these articles by using statistics. Then, we converted the problem of identifying highly cited papers into a problem of information retrieval, and used the Download Score (DS) and Journal Citation Score (JCS) to score and rank the papers. Finally, precision-recall curve was utilized to analyze and visualize results. Both indicators were proved to be functional to identify highly cited papers, with the DS more effective than JCS.

Key words: Highly cited papers, Precision-recall curve, Download score, Journal citation score, Supplementary indicator