图书馆杂志

图书馆杂志 ›› 2025, Vol. 44 ›› Issue (415): 75-87.

• 数字人文 • 上一篇    下一篇

双通道多粒度特征融合的古诗词命名实体识别——以唐宋时期为例

赵京胜1, 2  杨心怡1  曲维龙1  郑嘉上1  朱巧明2(1 青岛理工大学信息与控制工程学院 2 苏州大学计算机科学与技术学院)   

  • 出版日期:2025-11-15 发布日期:2025-11-26
  • 作者简介:

    赵京胜 青岛理工大学信息与控制工程学院,副教授,硕士生导师。研究方向:自然语言处理、中文信息处理。作者贡献: 指导研究方向、论文修订及定稿。E-mail zhao5199@163. com 山东青岛 266520
    杨心怡 青岛理工大学信息与控制工程学院,硕士研究生。研究方向:自然语言处理、数字人文。作者贡献:数据调研、设计模型与实验、论文撰写。 山东青岛 266520
    曲维龙 青岛理工大学信息与控制工程学院,硕士研究生。研究方向:情感三元组抽取。作者贡献:论文修改。山东青岛 266520
    郑嘉上 青岛理工大学信息与控制工程学院,硕士研究生。研究方向: 文本生成。作者贡献: 论文修改。山东青岛 266520
    朱巧明 苏州大学计算机科学与技术学院,教授。研究方向:自然语言处理、智能信息处理。作者贡献:论文指导。 江苏苏州 215006

Dual-Channel Multi-Granularity Feature Fusion for Named Entity Recognition in Ancient Poetry A Case Study of the Tang and Song Dynasties

Zhao Jingsheng1 2 Yang Xinyi1 Qu Weilong1 Zheng Jiashang1Zhu Qiaoming2(1 School of Information and Control Engineering Qingdao University of Technology 2 School of Computer Science and Technology Soochow University)   

  • Online:2025-11-15 Published:2025-11-26
  • About author:

    Zhao Jingsheng1 2 Yang Xinyi1 Qu Weilong1 Zheng Jiashang1Zhu Qiaoming2(1 School of Information and Control Engineering Qingdao University of Technology 2 School ofComputer Science and Technology Soochow University)

摘要:

针对古诗词领域训练数据匮乏的问题,本研究构建了POEM-NER 古诗词数据集,主要采集写景类古诗词数据,采用BIOES 方法进行实体标注。为了解决传统命名实体识别方法无法充分学习古诗词复杂的句子结构信息以及抽取的特征较为单一等问题,提出一种基于双通道多粒度特征融合的古诗词命名实体识别方法。首先,利用SikuBERT 预训练模型对古诗词语料进行字嵌入向量,通过Word2Vec 获取古诗词多粒度特征向量,并使用FLAT 进行嵌入融合,再将得到的信息向量输入BiLSTM 和IDCNN 双通道并行抽取特征。然后将抽取到的上下文特征和局部特征通过注意力机制动态融合。最终通过CRF 解码得到预测的序列标签。在C-CLUE 和POEM-NER 数据集上进行对比实验和消融实验,结果表明:提出的DMFF-APNER 模型可以有效利用多粒度特征提升模型的语义表征能力,并且在特征抽取层使用双通道技术实现了特征的互补,命名实体识别的提升效果明显,F1 值分别达到了82. 66%和86. 02%。

关键词: 命名实体识别&emsp, 多粒度特征融合&emsp, 双通道&emsp, 古诗词&emsp, 数字人文

Abstract:

In response to the scarcity of training data in the field of ancient Chinese poetry we constructthe POEM-NER dataset for ancient poetry focusing on collecting descriptive poetry data related to naturalscenery and utilizing the BIOES method for entity annotation. To overcome the limitations of traditionalnamed entity recognition NER methods in handling the complex sentence structures of ancient poetryand the extraction of relatively singular features we propose the DMFF-APNER method based on dualchannelmulti-granularity feature fusion for ancient poetry NER. First we pre-train the SikuBERTmodel on ancient poetry corpora obtain multi-granularity feature vectors using Word2Vec and use FLATfor embedding fusion. Next the information vectors are input into the BiLSTM and IDCNN dualchannelsfor parallel feature extraction. The extracted contextual and local features are then dynamicallyfused through an attention mechanism. Finally predicted sequence labels are obtained through CRFdecoding. Comparative and ablation experiments on the open-source C-CLUE dataset and the POEM-NERdataset demonstrate that DMFF-APNER model can effectively utilize multi-granularity features to enhancethe model’s semantic representation capability. Furthermore the use of dual-channel technology in the feature extraction layer complements the two types of features thereby significantly improving entityextraction performance. The F1 values on the C-CLUE dataset and the POEM-NER ancient poetry datasetreach 82. 66% and 86. 02% respectively.

Key words:

Named entity recognition, Multi-granularity feature fusion, Dual-channel, Ancient Poetry, Digital humanities