图书馆杂志

图书馆杂志 ›› 2025, Vol. 44 ›› Issue (405): 108-119.

• 数字人文 • 上一篇    下一篇

基于深度学习模型的命名实体识别对比研究——以民国电影类期刊为例

崔金英 颜 佳 (上海图书馆)   

  • 出版日期:2025-01-15 发布日期:2025-01-24
  • 作者简介:崔金英 上海图书馆(上海科学技术情报研究所), 工程师。研究方向:数字人文、大语言模型。作者贡献:论文框架设计实验分析、主要部分撰写与修改。E-mail:jycui@libnet.sh.cn 上海 200031 颜 佳 上海图书馆(上海科学技术情报研究所),副研究馆员。研究方向:数字人文。作者贡献:论文研究思路指导、部分论文撰写与修改。上海 200031

A Comparative Study of Named Entity Recognition Based on Deep Learning Models: A Case Study of Film Journals in the Republic of China Period

Cui Jinying, Yan Jia (Shanghai Library)   

  • Online:2025-01-15 Published:2025-01-24
  • About author:Cui Jinying, Yan Jia (Shanghai Library)

摘要:

本文构建了基于NEZHA-RTransformer-CRF 的深度学习模型,随机抽取了101 种期刊中的660 条语料为实验数据,通过人工标注的方式建立语料库,将文本输入到NEZHA 模型中抽取特征表征信息,再通过RTransformer 模型抽取局部信息,最后输入到CRF 中输出实体识别结果, 并与BERT-BiLSTM-CRF、BERT-BiGRU-CRF、SVM、ChatGLM-Ptuning 4 种模型进行了对比, 最终NEZHA-RTransformer-CRF 模型的准确率达到89.79%,F1 值提升明显,达到79.44%,验证了该模型的有效性和可靠性,从而证实了深度学习应用于民国时期电影类期刊语料的可行性,为进一步对民国期刊数据的挖掘提供了有效的数据支撑。

关键词: 深度学习 命名实体识别 电影类期刊 数字人文

Abstract:

This paper constructs a deep learning model based on NEZHA-RTransformer-CRF. A total of 660 corpora were randomly selected from 101 journals as experimental data. A corpus was established through manual annotation, and the text was input into the NEZHA model to extract feature representation information. Subsequently, the RTransformer model was employed to extract local contextual information, with the final output fed into the CRF for entity recognition. Comparative analysis was conducted against four other models: BERT-BiLSTM-CRF, BERT-BiGRU-CRF, SVM and ChatGLM-Ptuning. The NEZHA-RTransformer-CRF model achieved an accuracy of 89.79% and a significantly improved F1 value of 79.44%. This validates the effectiveness and reliability of the proposed model, confirming the feasibility of applying deep learning techniques to the Repulibcian-era journal corpora. The results provide valuable data support for further exploration of journal data from this period.

Key words: Deep learning, Named entity identification, Film journals, Digital humanities