图书馆杂志

图书馆杂志 ›› 2023, Vol. 42 ›› Issue (392): 22-35.

• 特别策划 • 上一篇    下一篇

图书情报领域大模型的应用模式和数据治理

刘倩倩1 刘圣婴2 刘炜1(1 上海图书馆 2 华东师范大学图书馆)   

  • 出版日期:2023-12-15 发布日期:2023-12-29
  • 作者简介:刘倩倩 上海图书馆(上海科学技术情报研究所),硕士,馆员。研究方向:数字人文、数据科学。作者贡献:论文框架设计、资料收集、初稿撰写。E-mail:qqliu@libnet.sh.cn 上海 200031 刘圣婴 华东师范大学图书馆,硕士,馆员。研究方向:数字人文、数据可视化。作者贡献:初稿撰写。上海 200062刘 炜 上海图书馆(上海科学技术情报研究所),博士,研究员。研究方向:智慧图书馆、数字人文。作者贡献:题目构思、修改定稿。 上海 200031

Data Governance and Application Development of LargeLanguage Models in Library and Information Services

Liu Qianqian1, Liu Shengying2, Liu Wei1(1 Shanghai Library; 2 East China Normal University Library   

  • Online:2023-12-15 Published:2023-12-29
  • About author:Liu Qianqian1, Liu Shengying2, Liu Wei1(1 Shanghai Library; 2 East China Normal University Library

摘要:

本文探讨了图书情报领域大语言模型的应用开发与数据治理要求。大语言模型是依赖海量文本数据,经过无监督预训练及有监督标注数据微调而成。领域大模型则是通用大模型经过领域数据的微调而得到,具备解决领域问题的能力,满足领域应用需求。本文首先回顾了生成式人工智能的突破历程,介绍了大模型的基本原理和应用现状,分析了大模型所具备的多任务能力背后的数据因素和数据需求。最后从数据治理角度重点讨论了领域大模型的应用潜力和方法流程。本文的主要贡献在于分析了图书情报领域大模型的应用模式和数据治理,为图书馆行业应用生成式人工智能技术提供了理论依据和实践指导。同时,文章也讨论了行业大模型应用和评估时需要关注的问题和局限性。

Abstract:

This article primarily discusses the data governance requirements and development patternsof large language models in the field of library and information science. Large language models rely onmassive amounts of text data for unsupervised pre-training and supervised fine-tuning. Domain-specificlarge models, on the other hand, are models that have been fine-tuned on domain-specific data to possessdomain knowledge and solve domain-specific problems to meet the needs of domain applications.The article aims to explore how to better apply generative artificial intelligence to libraries and relatedindustries to promote the development of smart libraries and provide the driving force for high-qualityservices. The article first reviews the breakthrough progress of generative artificial intelligence, and thenintroduces the basic principles and current applications of large models, as well as analyzes the datafactors and data requirements behind the various task capabilities of large models. Finally, the articlediscusses the application potential and development patterns of domain-specific large models. The maincontribution of this article is to analyze the application patterns and data governance of large models inthe field of library and information services, providing a theoretical basis and practical guidance for theapplication of generative artificial intelligence technology in the library industry. At the same time, it alsodiscusses the issues and limitations that need to be considered when applying and evaluating industryspecificlarge models.