图书馆杂志

图书馆杂志 ›› 2020, Vol. 39 ›› Issue (8): 75-81.

• 工作研究 • 上一篇    下一篇

民国抗战史主题词表自动构建研究

杜慧平  薛春香   

  • 出版日期:2020-08-25 发布日期:2020-08-25
  • 作者简介:杜慧平 女,上海师范大学人文学院信息管理系,副 教授。研究方向:信息组织。作者贡献:论文撰写。 E-mail:dhp0420@163.com 上海 200234 薛春香 女,复旦大学图书馆,副教授。研究方向: 知识组织。作者贡献:论文修改。 上海 200433

Automatic Construction of Thesaurus of the Anti-JapaneseWar History of the Republic of China Period

Du Huiping   Xue Chunxiang   

  • Online:2020-08-25 Published:2020-08-25

摘要: 针对民国文献资料开发利用的实际需求,以民国抗战史主题词表为例提出一套专题主题
词表自动构建方案,用以组织民国资料信息并探索专题词表的构建技术。以《申报》为主要语
料,通过实例给出民国抗战史主题词表构建关键技术解决方案,包括多种途径收集民国抗战史领
域词汇,采用词频统计、同现分析等统计自然语言处理方法辅助编表专家确定词表收词范围和识
别词汇之间的概念关系,并探讨了民国抗战史主题词表的宏观结构、收词范围和方法、存储与发
布利用。运用自动化方法并辅以人工判定实现主题词表的构建,能够节省编表时间,降低编表负
担和节约成本,便于词表维护,从而促进主题词表的应用和推广。


Abstract: In view of the problems existing in the information retrieval of the Republic of China period, this
paper took the Anti-Japanese War history as an example and proposed a scheme for automatic construction
of thesaurus that can improve information searching efficiency. With Shenbao as the main corpus, this
paper provided the key technical solutions to the generation of thesaurus and showed examples, including
collecting vocabulary through various ways, assisting experts in determining the candidate terms included
in the thesaurus, and identifying the hyponym and hypernym relationship, synonymous relationship, and
associative relationship between terms using natural language processing techniques such as frequency
statistics and co-occurrence analysis. Finally, the macro-structure, the scope and methods of collecting
words, storage and publication of the thesaurus were discussed. Automating the process of thesaurus
construction assisted by manual judgement can save time, efforts and costs in addition to allowing easy
maintenance and expansion of the constructed thesaurus and promoting the application of thesaurus.