图书馆杂志

图书馆杂志 ›› 2025, Vol. 44 ›› Issue (412): 67-80.

• 数字人文 • 上一篇    下一篇

基于半监督学习的历史古籍事件主题识别模型研究

武兆迪1,2  王 昊1,2  裘靖文1,2(1 南京大学信息管理学院 2 江苏省数据工程与知识服务重点实验室)   

  • 出版日期:2025-08-15 发布日期:2025-09-02
  • 作者简介:

    武兆迪 南京大学信息管理学院,硕士研究生。研究方向:数字人文、自然语言处理。作者贡献:论文撰写和修改。E-mail 502023140031 @ smail. nju. edu. cn  江苏南京210000 

    王 昊 南京大学信息管理学院,博士,教授,博士生导师。研究方向:情感分析与数据挖掘、自然语言处理。作者贡献:提供研究思路、修改意见及定稿。 江苏南京 210000

    裘靖文 南京大学信息管理学院,博士研究生。研究方向:情感分析与数据挖掘。作者贡献:搜集数据、共同处理数据。 江苏南京  210000

Research on Event Subject Recognition Model of Chinese ClassicTexts Based on Semi-Supervised Learning

Wu Zhaodi1 2 Wang Hao1 2 Qiu Jingwen1 21 School of Information Management Nanjing University 2 Jiangsu Key Laboratory of Data Engineeringand Knowledge Service   

  • Online:2025-08-15 Published:2025-09-02
  • About author:

    Event clustering Digital humanities Ancient classics Semi-supervised learningTopic recognition

摘要:

如何从大规模文本中抽取和泛化事件已成为当前古籍事件研究的一个关键问题。针对古籍文本和古代汉语的特点,本文构建了一种半监督事件聚类模型USKm,该模型利用USIF 表征古文历史事件,基于约束距离集成,将邻近区域点纳入数据点类簇的二次决策过程对事件进行聚类从而实现主题识别。以《后汉书》为研究对象,笔者对比了USKm 与传统聚类模型的应用效果,发现USKm 性能更优。笔者可视化东汉政权存续期间时间分布,绘制历史事件人物关系图谱,并解析背后的历史现象探讨东汉政权的发展规律。USKm 模型通过半监督训练,提高了事件特征的识别准确性和聚类效果,同时本文对聚类结果数据加工整理与可视化,从数字人文视阈为人文研究者提供新的研究思路和角度。

关键词: 事件聚类, 数字人文, 古代典籍, 半监督学习, 主题识别

Abstract:

Extracting and generalizing events from large-scale texts has emerged as a crucial issue in thecontemporary studies of ancient literature. Based on the distinctive characteristics of ancient texts andancient Chinese language this study introduces a semi-supervised event clustering model named USKmwhich employs USIF to represent historical events found in ancient texts. By incorporating neighboringdata points into the secondary decision-making process the model effectively achieves the goal of eventgrouping and topic recognition. Using the Later Han Shu as the primary subject of investigation theauthor compares the performance of USKm with traditional clustering models revealing its superioreffectiveness. The study further visualizes the temporal distribution of the Eastern Han Dynastyconstructs a graphical depiction of the interplay between historical figures and events and delves into theunderlying historical phenomena to unveil the developmental trends of the period. The USKm modelenhances the precision of event feature recognition and clustering efficacy through semi-supervisedtraining offering novel research insights and perspectives for scholars in digital humanities by processingand visualizing the clustered data results.