图书馆杂志

图书馆杂志 ›› 2024, Vol. 43 ›› Issue (395): 101-108.

• 数字人文 • 上一篇    下一篇

基于条件随机场挖掘文本史料中事件信息的方法与实证研究——以《拉贝日记》数字人文研究为例

赵小萱1 1, 2, 3 黄紫荆1 1 南京大学地理与海洋科学学院 2 江苏省地理信息技术重点实验室 3 自然资源部国土卫星遥感应用重点实验室)   

  • 出版日期:2024-03-15 发布日期:2024-04-01
  • 作者简介:赵小萱 南京大学地理与海洋科学学院,硕士研究生。作者贡献:数据处理、确定实验方法、撰写及修改论文。E-mail:zxx_0413@foxmail.com 江苏南京 210023 陈 刚 南京大学地理与海洋科学学院、江苏省地理信息技术重点实验室、自然资源部国土卫星遥感应用重点实验室,博士,副教授。研究方向:历史地理信息化、数字人文等。作者贡献:修改论文框架、提供资料。 江苏南京 210023 黄紫荆 南京大学地理与海洋科学学院,硕士研究生。作者贡献:论文修改。 江苏南京 210023

A Methodological and Empirical Study of Extracting Event Information in Textual Historical Materials Based on Conditional Random Fields: Taking the Digital Humanities Study of the Rabe’s Diary as an Example

Zhao Xiaoxuan1, Chen Gang1, 2, 3, Huang Zijing1 (1 School of Geography and Ocean Science, Nanjing University; 2 Jiangsu Provincial Key Laboratory of Geographic Information Science and Technology; 3 Key Laboratory for Land Satellite Remote Sensing Applications of Ministry of Natural Resources)   

  • Online:2024-03-15 Published:2024-04-01
  • About author:Zhao Xiaoxuan1, Chen Gang1, 2, 3, Huang Zijing1 (1 School of Geography and Ocean Science, Nanjing University; 2 Jiangsu Provincial Key Laboratory of Geographic Information Science and Technology; 3 Key Laboratory for Land Satellite Remote Sensing Applications of Ministry of Natural Resources)

摘要:

文本史料被广泛数字化,如何从文本中提取地理命名实体及相关信息,有效开展地理信息挖掘成为重要研究课题。本文针对历史档案文档的特点,提出一种以地理命名实体为核心,使语义信息与地理位置关联,将文本描述的事件信息转化为各个地理命名实体的属性数据的事件抽取理念,提取出有关时间、地点、人物、事物、事件、现象等与地理命名实体相关的事件要素。研究以《拉贝日记》中收录的《日本士兵在南京安全区的暴行》档案为实证案例,采用条件随机场方法,抽取事件信息,结合历史地图等相关资料,将地理信息最终映射到地图上。本文方法有助于拓展文本资料在数字信息时代的开发利用方式,开辟文本挖掘分析与知识发现的新思路。

Abstract:

Textual histories are widely digitized. How to extract geographically named entities and related information from the texts and how to effectively realize geographic information mining have become an important research topic. This paper proposes an idea of extracting event elements related to time, place, persons, things, events and phenomena associated with geographically named entities by taking the geographically named entities as the core and making the semantic information associated with geographical locations, and by converting the event information described in the text into the attribute data of each geographically named entity. The study used the document Japanese Soldiers’ Atrocities in the Nanking Safety Zone included in Rabes Diary as an empirical case, and used the conditional random field method to extract events. Combined with historical maps and other related data, geographical information is finally mapped to the map. The methodology of this paper expands the way textual information is exploited in the digital information era, opening up new ideas for text mining analysis and knowledge discovery.