图书馆杂志

图书馆杂志 ›› 2020, Vol. 39 ›› Issue (11): 97-105.

• 数字人文研究 • 上一篇    下一篇

数字人文下的先秦古汉语关键词抽取应用 ——以《春秋经传》为例

秦贺然 王东波   

  • 出版日期:2020-11-25 发布日期:2020-11-25
  • 作者简介:秦贺然 女,连云港中医药高等职业技术学校现代技 术教育中心图书馆,助理馆员。研究方向:自然语言 处理与文本挖掘。作者贡献:论文实验数据的计算和 论文的撰写。E-mail:1789900687@qq.com 江苏连云 港 222007 王东波 南京农业大学信息科技学院,教授。研究方 向:自然语言处理与文本挖掘。作者贡献: 论文设想 的提出和论文的修改。江苏南京 210095

The Application of Key Words Extraction in Pre-Qin Ancient Chinese: Taking the Spring and Autumn Annals as an Example for Digital Humanities

Qin Heran, Wang Dongbo   

  • Online:2020-11-25 Published:2020-11-25

摘要: 数字人文作为一门交叉学科,其强调计算技术与人文学科融合发展。古汉语典籍是人文 学科研究中重要的一部分,在此背景下,利用计算机技术对数字化后的《春秋经传》典籍进行关 键词抽取探究,从而分析春秋经传的关键词分布情况。本文利用了三种关键词抽取算法,分别是 基于无监督的TextRank算法、经典传统TF-IDF算法和LDA主题模型算法。基于Pooling的评价方法 发现TextRank算法抽取的关键词结果更好,准确率达到84%。传统的TF-IDF算法和LDA主题模型算 法准确率分别为62%和74%。同时,根据所抽取的关键词,可以发现春秋经传的记事内容主要围绕 在诸侯国之间的聘问、会盟、征伐、婚丧、篡弑等。 

Abstract: As an interdisciplinary subject, Digital Humanities emphasizes the integration and development of computing technology and humanities. Ancient Chinese classics is an important part of the study of humanities. In this context, we use computer technology to extract keywords from the digitized classics of the Spring and Autumn period, so as to analyze the distribution of keywords in the classics of the Spring and Autumn period. In this paper, three keyword extraction algorithms are used, which are based on unsupervised textrank algorithm, traditional TF-IDF algorithm and LDA topic model algorithm. Based on evaluation method of pooling, it is found that textrank algorithm can extract better keywords with an accuracy of 84%. The accuracy of traditional TF-IDF algorithm and LDA topic model algorithm is 62% and 74% respectively. At the same time, according to the keywords drawn out, we can find that the chronicles of the Spring and Autumn period mainly focus on the interrogation, alliance, expedition, marriage and funeral, usurpation and killing among the vassal states. Keywords Digital Humanities, TextRank,