图书馆杂志

图书馆杂志 ›› 2026, Vol. 45 ›› Issue (4): 71-81.

• 工作研究 • 上一篇    下一篇

基于大模型的文献数据库服务创新探索与研究——以《全国报刊索引》数据库智能检索服务为例

戴晴宜,韩春磊,高智晨   

  • 出版日期:2026-04-15 发布日期:2026-04-29
  • 作者简介:戴晴宜 上海图书馆(上海科学技术情报研究所),数字资源中心工程师。研究方向:数字资源平台建设、数字人文、大模型应用。作者贡献:论文资料收集与整理、论文撰写与修改。E-mail:qydai@libnet.sh.cn  上海 200031
    韩春磊  上海图书馆(上海科学技术情报研究所),数字资源中心主任,研究馆员。研究方向:文献资源数字化、数字人文、数字资源平台建设。作者贡献:论文选题、论文修改。 上海 200031
    高智晨  上海双地信息系统有限公司,研发负责人。研究方向:大语言模型在垂直领域的应用、RAG方向算法优化。作者贡献:应用开发、算法逻辑实现、交互设计。 上海 200092

Exploration and Research on the Innovation of Literature Database Service Based on LLM:  Using Quan Guo Bao Kan Suo Yin (CNBKSY) Intelligent Search Service as a Case Study

Dai Qingyi,  Han Chunlei, Gao Zhichen   

  • Online:2026-04-15 Published:2026-04-29
  • About author:Dai Qingyi,  Han Chunlei, Gao Zhichen

摘要: 本文基于大模型技术,围绕《全国报刊索引》平台的智能化升级需求,提出了一种融合自然语言处理(NLP)、语义检索、生成式问答的智能检索系统,旨在解决传统检索效率低、查全率和查准率不足的问题。系统主要包含3个核心创新:首先,通过多源异构数据的融合与集成,构建统一的知识表示模型,突破了文献资源的格式差异,实现从关键词匹配到语义理解的跨越式升级;其次,基于BERT和BGE等向量化模型,结合BM25和Solr检索等多策略召回机制,实现了精确高效的文献检索;最后,系统集成了智能问答模块,支持自然语言的多轮对话检索与高精度问答。测试结果表明,该系统在检索效率、查全率和查准率方面较传统检索方法有显著提升,为《全国报刊索引》平台的智能化发展提供了可行的技术路径。

关键词: 自然语言处理, 语义检索, 智能问答, 多源异构数据融合, 向量化模型, 向量数据库, 生成式大语言模型, 多路召

Abstract: Based on large language model(LLM) technology, this paper addresses the demand for intelligent upgrading of the Quan Guo Bao Kan Suo Yin(CNBKSY) platform and proposes an intelligent retrieval system integrating natural language processing(NLP), semantic retrieval, and intelligent Q&A. The system aims to overcome the limitations in traditional systems , such as low retrieval efficiency and insufficient recall and precision. It embodies three key innovations. First, it constructs a unified knowledge representation model by integrating heterogeneous data from multiple sources, overcoming format inconsistencies among literature resources and achieving a significant transition from keyword matching to semantic understanding. Second, the system employs advanced vectorization models such as BERT and BGE, combined with hybrid retrieval strategies including BM25 and Solr-based methods, to achieve precise and efficient document retrieval. Third, the system incorporates an intelligent Q&A module, supporting multi-round natural language search and high-precision question answering. Thetest results demonstrate that the system achieves significant improvements over traditional retrieval methods in terms of efficiency, recall, and precision, providing a viable technical solution for the intelligent development of the CNBKSY platform.

Key words: Natural language processing, Semantic retrieval, Intelligent Q&, A, Multi-source heterogeneous data fusion, Vectorized modeling, Vectorized database, Generative large language models(LLM), Multi-route retrieval