图书馆杂志

图书馆杂志 ›› 2022, Vol. 41 ›› Issue (10): 25-34.

• 工作研究 • 上一篇    下一篇

基于多源数据融合的公共文化领域词表构建研究

王晓雪1 化柏林2, 3
(1 北京大学软件与微电子学院 2 北京大学信息管理系 3 公共文化服务大数据
应用文化和旅游部重点实验室)
  

  • 出版日期:2022-10-15 发布日期:2022-10-19
  • 作者简介:朱云琴 女,昆明理工大学管理与经济学院,硕士研究生。研究方向:IS使用行为。作者贡献:建立模型、分析数据、撰写论文。E-mail:18883994582@163.com云南昆明 650093 陈 渝 昆明理工大学管理与经济学院,院长,教授,博士研究生导师。研究方向:IS 采纳/ 使用行为。作者贡献:确立论文的研究思路。 云南昆明 650093

Research on Building Vocabulary in the Field of Public Culture Based on Multi-Source Data Fusion

Wang Xiaoxue1, Hua Bolin2, 3 (1 School of Software and Microelectronics of Peking University; 2 Department of Information Management of Peking University; 3 Key Laboratory of Culture and Tourism of#br# Ministry of Public Cultural Services Big Data Application)   

  • Online:2022-10-15 Published:2022-10-19
  • About author:Wang Xiaoxue1, Hua Bolin2, 3 (1 School of Software and Microelectronics of Peking University; 2 Department of Information Management of Peking University; 3 Key Laboratory of Culture and Tourism of Ministry of Public Cultural Services Big Data Application)

摘要:

公共文化云发展迅速,公共文化智慧化模式层出不穷,要对公共文化发展的整体现状进行实时的监测扫描与深入的分析挖掘,需要构建领域的主题词表,以增加分析挖掘的准确性以及数据分析结果的可读性。为此,如何基于政策法规、活动报道等文本内容,生成一部能够反映公共文化领域最新最全的词表,是公共文化大数据建设的一项重要内容。本文搜集了公共文化领域政策法律文件和政府公告、各地文化活动数据、学术论文、新闻报刊,通过自动抽取和人工标注获取其中的术语,采用规则方法、K-means、KNN 等多种方法对术语分类,形成术语词典。这部词典初步收录了公共文化相关的19 个大类、约2.8 万条词条,后续可继续扩展。

Abstract:

With the rapid development of public culture cloud and the endless emergence of the smartmodels of public culture, building a vocabulary about this field is necessary for carrying out real timemonitoring and in-depth analysis and mining, as using a vocabulary can increase the accuracy of analysisand mining as well as the readability of data analysis results. Therefore, how to generate an up-to-dateand complete vocabulary in the field of public culture based on policies, laws and regulations, activityreports and other text data is an important content of public culture big data development. We collectedgovernment policy documents, legal documents, government announcements, cultural activity data,newspaper and periodicals in the field of public culture, obtained terms from these texts through automaticextraction and artificial tagging, and classified the terms by rules, K-means, KNN and other methods. Thedictionary now contains 19 categories and about 28,000 entries related to public culture, which can beexpanded in the future.