图书馆杂志

图书馆杂志 ›› 2023, Vol. 42 ›› Issue (390): 87-94.

• 数字人文 • 上一篇    下一篇

基于DA-BERT-CRF 模型的古诗词地名自动识别研究——以金陵古诗词为例

余馨玲1 常 娥1,2
(1 东南大学经济管理学院 2 东南大学图书馆)
  

  • 出版日期:2023-10-15 发布日期:2023-11-01
  • 作者简介:余馨玲 东南大学经济管理学院,硕士研究生。研究方向:知识组织。作者贡献:论文撰写、程序实现。E-mail:2419738474@qq.com 江苏南京 211189 常 娥 东南大学图书馆,硕士研究生导师。研究方向:知识组织与管理。作者贡献:确定选题、论文修改。 江苏南京 211189

Automatic Recognition of Place Names in Ancient PoetryBased on DA-BERT-CRF Models: Taking the AncientPoetries of Nanjing as an Example

Yu Xinling1, Chang E1, 2(1 School of Economics and Management, Southeast University; 2 Southeast University Library)   

  • Online:2023-10-15 Published:2023-11-01
  • About author:Yu Xinling1, Chang E1, 2(1 School of Economics and Management, Southeast University; 2 Southeast University Library)

摘要:

古诗词地名实体识别不仅有助于深度挖掘古诗词文本之间的关联,而且有助于绘制中国诗歌版图分布,推动空间维度的中国古典文学研究。文章围绕南京城系统采集有关古诗词数据,采用BIOES 方法进行地名实体标注。针对古诗词领域训练数据匮乏、以字代词等问题,提出一种采用数据增强方法,同时融合预训练模型与条件随机场方法的古诗词地名识别模型,简称DABERT-CRF 模型。文章将训练数据采用实体交叉互换方法进行数据增强处理,然后通过预训练模型BERT 得到古诗词地名的上下文语义信息,最后利用条件随机场CRF 实现地名标签约束并生成全局最优地名序列。文章提出的DA-BERT-CRF 模型十折交叉实验平均精确率、平均召回率和平均F 值分别为86.49%、90.44%、88.35%。

Abstract:

The entity recognition of the place in ancient poetry not only helps to deeply explore therelationship between ancient poems, but also helps to draw the distribution of Chinese poetry and promotethe study of Chinese classical literature in spatial dimension. The paper collected the data about theancient poetry of Nanjing and marked the place names with BIOES. Aiming at the lack of training data inthe field of ancient poetry, the paper proposed a place name recognition model in ancient poetry, whichused a data augmentation method and combined the pre-training model and CRF model, called DABERT-CRF model. In this paper, the training data was enhanced by the entity cross-exchange method.The context semantic information of the place names in ancient poetry was obtained by BERT model.Then, the CRF model was used to realize the constraint of the place name label and to generate the globaloptimal place name sequence. The average accuracy, average recall and average F value of the DA-BERTCRFmodel presented in this paper were 86.49%, 90.44% and 88.35% respectively.