Libraly Journal

Libraly Journal ›› 2026, Vol. 45 ›› Issue (3): 53-65.

Previous Articles     Next Articles

Constructing the Dataset and Implementing the Retrieval System for the Synopsis Bibliography

Yan Xinjie, Xiao Zhuo, Lu Ziyan, Xu Jian   

  • Online:2026-03-15 Published:2026-03-24
  • About author:Yan Xinjie, Xiao Zhuo, Lu Ziyan, Xu Jian

Abstract: Synopsis bibliography represents a crucial embodiment of the principles of ancient  Chinese bibliographic studies, characterized by “distinguishing academic disciplines and tracing the origin and development of knowledge”. As such, it holds an immense scholarly value. However, existing synopsis bibliography resources are fragmented and uneven in quality, impeding advancements in synopsisbased research applications. In this study, adhering to the principles of comprehensiveness, authority, copyright protection, and careful version selection, 47 synopsis bibliographies were digitized and processed to extract 24 data fields—including title, volume count, category, edition, and contributor, resulting in a dataset of 59,624 records. Leveraging a finetuned GujiBERT model, 1,669,117 entity records were automatically extracted from the texts. Concurrently, an online retrieval platform was developed to enable visualized fulltext search and dataset downloads. This digital processing method enhances the efficiency of synopsis bibliography management and utilization, while the constructed dataset provides new support for indepth analyses of character profiles, geographical relationships, and evaluative features in ancient texts, thereby promoting the deep mining and sharing of ancient literature information.

Key words: Synopsis bibliography, Methodology, Data integration, Retrieval system