图书馆杂志

图书馆杂志 ›› 2018, Vol. 37 ›› Issue (12): 56-63.

• 工作研究 • 上一篇    下一篇

一种针对已知作者的姓名消歧方法

范午攸   

  1.  
  • 出版日期:2018-12-15 发布日期:2018-12-24
  • 作者简介:范午攸 上海交通大学图书馆,助理馆员。研究方 向:数据分析、学科服务。E-mail:wyfan@lib.sjtu. edu.cn 上海 200240

Method to Remove Ambiguity of Names of Known Authors

Fan Wuyou   

  • Online:2018-12-15 Published:2018-12-24

摘要: 在外文期刊数据库中,同一姓名简称代表多位作者的现象十分普遍,严重影响作者检索的精度。本次研究将规则与算法相结合,依据规则为分类算法标注训练数据,从而在无监督条件下使用有监督算法,实现作者的精确检索。该算法适用于论文查证等已知作者身份的姓名消歧问题,相比通用的消歧方法,该方法结合无监督算法无需人工标注的优点,以及有监督算法高效率、易对应实体的优点。实践结果表明,该方法具有较高的准确度。

关键词: 作者姓名消歧 , 数据标注 , 分类算法 , 朴素贝叶斯

Abstract: In foreign periodicals databases, a prevalent problem is to use the same abbreviation for names of several authors. It seriously affects the accuracy of the author search. This paper attempts to, by utilizing rules and algorithms, enable accurate search by author names: it annotates training data for classification algorithm based on rules, so that supervised algorithm can be conducted in unsupervised conditions. The algorithm is suitable for author name disambiguation of the known authors. Compared with regular disambiguation methods, this method, because of the unsupervised algorithm, does not require manual annotation, and thus features higher efficiency and is easier to correspond with entity. The method is proved to result in higher accuracy in practice.

Key words:  , Author name disambiguation, Data annotation, Classification algorithm, Naive Bayes