图书馆杂志

图书馆杂志 ›› 2025, Vol. 44 ›› Issue (415): 28-39.

• 工作研究 • 上一篇    下一篇

基于语义信息的术语加权算法提升科技文献检索的准确性

张 敏1, 4  李 唯2  范 青3[1 中国科学院武汉文献情报中心 2 武汉软件工程职业学院(武汉开放大学)  3 华中师范大学国家文化产业研究中心 4 科技大数据湖北省重点实验室]   

  • 出版日期:2025-11-15 发布日期:2025-11-26
  • 作者简介:

    张 敏 中国科学院武汉文献情报中心,工程师, 博士。研究方向:信息抽取、语义索引、深度学习等。作者贡献:文献调研、设计实验、论文撰写及修改。E-mailzhangmin2012@mail. whlib. ac. cn 湖北武汉430071
    李 唯 武汉软件工程职业学院(武汉开放大学),教授。研究方向:深度学习、数据挖掘与分析。作者贡献:设计研究方案,撰写部分论文,修改和完善论文。 湖北武汉430205 
    范 青 华中师范大学国家文化产业研究中心,副教授。研究方向: 知识工程、数据挖掘与分析。作者贡献:参与实验工作及评价。 湖北武汉 430079

Improving the Accuracy of Scientific Literature Retrieval through Term Weighting Algorithms Based on Semantic Information

Zhang Min1 4 Li Wei2 Fan Qing3(1 Wuhan Library Chinese Academy of Sciences 2 Wuhan Vocational College of Software andEngineering Wuhan Open University 3 National Cultural Industry Research Center of Central ChinaNormal University 4 Hubei Key Laboratory of Big Data in Science and Technology)   

  • Online:2025-11-15 Published:2025-11-26
  • About author:

    Zhang Min1 4 Li Wei2 Fan Qing3(1 Wuhan Library Chinese Academy of Sciences 2 Wuhan Vocational College of Software andEngineering Wuhan Open University 3 National Cultural Industry Research Center of Central ChinaNormal University 4 Hubei Key Laboratory of Big Data in Science and Technology)

摘要:

传统的术语加权算法常常忽略了科技文献中术语的语义信息,从而无法准确评估术语的重要度,导致科技文献检索结果的准确性不高。为了充分利用术语的语义信息,本文提出了一种基于语义信息的术语加权算法,旨在提高科技文献检索的准确性。本文提出的算法将使用语义信息权重来衡量术语的语义信息重要度。同时,该算法还会基于TF-IDF 算法计算术语的关键词权重。通过将这两种权重进行加权,得到一种综合的术语权重来衡量术语的重要度。在实验中,本文提出的术语加权算法相比于传统的TF-IDF 和BM25 算法表现出了更好的效果,能够有效地提高科技文献检索的准确性。

关键词: 术语加权&emsp, 语义信息权重&emsp, 检索准确性&emsp, 科技文献检索&emsp, TF-IDF

Abstract:

Traditional term weighting algorithms often overlook the semantic information of terms inscientific literature thereby failing to accurately assess the importance of terms and leading to lowaccuracy in the retrieval of scientific literature. In order to fully utilize the semantic information of termsthis paper proposes a semantic-based term weighting algorithm aimed at improving the accuracy ofretrieval in scientific literature. The proposed algorithm will utilize semantic information weights tomeasure the importance of term􀆳s semantic information. Simultaneously the algorithm will also calculatethe keyword weight of terms based on the TF-IDF algorithm. By combining these two weights acomprehensive term weight is obtained to gauge the importance of the term. In the experiments the termweighting algorithm proposed in this paper demonstrated better performance compared to traditional TFIDFand BM25 algorithms effectively enhancing the accuracy of retrieval in scientific literature.

Key words:

Term weighting, Semantic information weight, Search accuracy, Scientific literatureretrieval, TF-IDF