图书馆杂志

图书馆杂志 ›› 2022, Vol. 41 ›› Issue (3): 126-134.

• 信息管理 • 上一篇    下一篇

基于全文的科研文献中仪器命名实体识别(NER)研究与实践

范午攸(上海交通大学图书馆)   

  • 出版日期:2022-03-15 发布日期:2022-03-21
  • 作者简介:范午攸 上海交通大学图书馆,馆员。研究方向:数 据分析、学科服务、机构知识库。E-mail: fanwuyou@ sjtu.edu.cn 上海 200240

Research of Instrument Named Entity Recognition (NER) in Research Paper Based on the Full Text

Fan Wuyou (Shanghai Jiao Tong University Library)   

  • Online:2022-03-15 Published:2022-03-21
  • About author:Fan Wuyou (Shanghai Jiao Tong University Library)

摘要: 科研文献正文中包含未被文摘、题录记载的仪器信息,从正文中有效提取此类信息可作
为仪器绩效评估等定量研究的依据。文章以化学领域论文与大型分析仪器为对象,实现了通过语
义相似度及构词规律从文献中发现未知仪器名、针对PDF 排版的仪器名模糊检索,以及基于文献
类型、正文结束标识、使用标识词、全称简称对应关系的实际使用仪器与未使用仪器和同名实体
的区分,并与人工标注结果比对验证了准确性。

Abstract: The full-text of research papers contain information of instruments which have not been
recorded. The effective extraction of the instrumental information from the text can be used as the basis
for quantitative research such as instrument performance evaluation. In this paper, chemical papers and
large-scale analytical instruments are taken as the object, and the unknown instrument name is found from
the literature through semantic similarity and word formation rules, the instrument name fuzzy is retrieved
for PDF typesetting, and the distinction between the actual used instruments and the unused instruments
or the entities with the same name is made based on document type, main text end identification, usage
identification words and corresponding relationship of full name abbreviation. The accuracy of such
efforts is verified by comparing with the results of manual annotation.