图书馆杂志

图书馆杂志 ›› 2026, Vol. 45 ›› Issue (4): 98-107.

• 数字人文 • 上一篇    下一篇

历史文献事件信息的空间聚合研究:以《汉书》人物传记为例

林立涛,沈雪莹,欧石燕   

  • 出版日期:2026-04-15 发布日期:2026-04-29
  • 作者简介:林立涛  南京大学信息管理学院,博士研究生。研究方向:计算人文、知识组织、自然语言处理。作者贡献:研究设计、论文撰写、数据处理与分析。E-mail:litaolin@smail.nju.edu.cn  江苏南京 210023
    沈雪莹  南京大学信息管理学院,博士研究生。研究方向:数字人文、语义网、文本挖掘。作者贡献:数据获取、论文撰写。 江苏南京 210023
    欧石燕  南京大学信息管理学院,教授,博士生导师。研究方向:数字人文、语义网与文本挖掘。作者贡献:研究设计、论文审阅与修改。 江苏南京  210023

Spatial Aggregation of Event Information in Historical Documents: A Case Study of Biographies in the Book of Han

Lin Litao, Shen Xueying, Ou Shiyan   

  • Online:2026-04-15 Published:2026-04-29
  • About author:Lin Litao, Shen Xueying, Ou Shiyan

摘要: 历史地理数据为人物事件提供了时空背景,对其有效组织与利用能够深化典籍文本的语义关联。本研究以《汉书》为对象,探索历史地理数据如何服务历史文献的组织和分析。利用中国历史地理信息系统(CHGIS),获取历史地名与政区沿革数据,构造跨时空地名知识库;在事件信息抽取范式下,对比传统预训练语言模型和大型语言模型在古文事件信息检测任务上的适用性,并采用GujiBERT-BiLSTM-CRF模型抽取《汉书》人物传记中的事件句并标注事件类型;采用古汉语词性标注和规则匹配等方法进一步抽取事件句中的地名,并利用神经网络和地名知识库对地名进行实体链接,聚合异名同地的相关事件句;结合CHGIS可视化呈现不同类型事件的空间分布状况。从《汉书》人物传记中抽取事件句18525条,经过地名实体链接,实现对1253条事件句的空间定位。以地名为事件信息检索点的召回率提升至原先的1.3倍;基于数字地图的可视化结果增强了历史文献的可读性。本研究验证了历史地理数据在古籍事件信息组织与分析中的应用潜力,为历史文献的时空关联挖掘提供了方法参考。

关键词: 地名沿革, 信息聚合, 事件检测, 中国历史地理信息系统, 计算人文

Abstract: Historical geographic data provide spatial-temporal context for events involving historical figures. The effective organization and utilization of such data can deepen the semantic associations of classical texts. Taking the Book of Han(Han Shu) as the research object, the study explores how historical geographic data can support the organization and analysis of historical documents. By utilizing the China Historical Geographic Information System(CHGIS), historical place names and administrative division evolution data are obtained to construct a cross-temporal place-name knowledge base. Based on the event information extraction paradigm, it compares the applicability of traditional pretrained language models and large language models in detecting event information in classical Chinese texts. The GijiBERT-BiLSTM-CRF model is employed to extract event sentences from the biographies in the Book of Han and to annotate the event types. Using classical Chinese part-of-speech tagging and rule-matching methods, the study further extracts place names from the event sentences. A neural network and the toponym knowledge base are then employed to link the place names to entities, aggregating event sentences that refer to the same place under different names. CHGIS is used to visualize the spatial distribution of different types of events. A total of 18,525 event sentences were extracted from the biographies in the Book of Han. Through toponym entity linking, 1,253 event sentences were spatially located. Using place names as retrieval points for event information increased the recall rate to 1.3 times the original level. Visualization based on digital maps also improved the readability of historical data. This study validates the potential of historical geographic data in organizing and analyzing event information in ancient texts, providing a practical methodological reference for spatio-temporal association mining in historical documents.

Key words: Place name evolution, Information aggregation, Event detection, China , Historical Geographic Information System(CHGIS), Digital humanities