Libraly Journal

Libraly Journal ›› 2024, Vol. 43 ›› Issue (393): 96-108.

Previous Articles     Next Articles

Research on the Construction Method and Distribution Law of Object-Image Database for Ancient Poetry

Liu Maolin1, 2, Zhao Meng1, 2, Wang Hao1, 2 (1 School of Information Management, NanjingUniversity; 2 Jiangsu Key Laboratory of Data Engineering and Knowledge Service)   

  • Online:2024-01-15 Published:2024-01-31
  • About author:Liu Maolin1, 2, Zhao Meng1, 2, Wang Hao1, 2 (1 School of Information Management, NanjingUniversity; 2 Jiangsu Key Laboratory of Data Engineering and Knowledge Service)

Abstract:

From the perspective of digital humanities, ancient poetry resources are of great value butdifficult to be analyzed at scale. The research on the automatic construction method of knowledge base ofancient poetry is conducive to the analysis and research of ancient poetry from a macro perspective and themining of its value. Firstly, based on the concept of “object image”, the key information in ancient poemsis extracted to reduce the complexity of analysis to build an automated process. Secondly, roberta-BilstMCRFmodel is constructed based on deep learning method, and object image is extracted from ancient poetrycorpus. Then, The Whole Tang Dynasty Poems and some Song Dynasty poetry resources are used to verifythe feasibility and universality of the model. Finally, the object image database of The Whole Tang DynastyPoems is constructed successfully, and the distribution law of the object images is preliminarily analyzed.After using the automatic tagging corpus training model, the F1 scores of common nouns, time nounsand place names reached 89.6%, 93.3% and 93.6% respectively. The model was transferred to the SongDynasty poetry corpus that was not used for training, and the extraction density was 4.5 objects per poem,which showed the ability to discover unknown words, indicating that the model has good universality andexpansibility.