图书馆杂志

图书馆杂志 ›› 2025, Vol. 44 ›› Issue (416): 81-92.

• 信息管理 • 上一篇    下一篇

基于RoBERTa-MHA-BiGRU 的社交媒体虚假健康信息识别研究

陈明红 何嘉宁(中山大学信息管理学院)   

  • 出版日期:2026-01-15 发布日期:2026-01-08
  • 作者简介:

    陈明红 中山大学信息管理学院,副教授,硕士生导师。研究方向:健康信息学、社交媒体信息行为。作者贡献:论文总指导、选题构思、提纲设计与修改定稿。E-mailchenmh23@mail. sysu. edu. cn 广东广州 510006

    何嘉宁 中山大学信息管理学院,硕士研究生。研究方向:健康信息学。作者贡献:数据采集、数据处理与分析、初稿撰写与论文修改。 广东广州 510006

Research on Identification of Health Misinformation on Social Media Based on RoBERTa-MHA-BiGRU

Chen Minghong, He Jianing (School of Information Management Sun Yat-sen University)   

  • Online:2026-01-15 Published:2026-01-08
  • About author:

    Chen Minghong, He Jianing (School of Information Management Sun Yat-sen University)

摘要: 社交媒体中的虚假健康信息纷繁复杂,且传播速度快,对公众健康危害大。快速、有效识别社交媒体虚假健康信息具有重要意义。本文首先从多个社交媒体搜集健康信息,建立中英文健康数据集, 并构建社交媒体虚假健康信息识别的RoBERTa-MHA-BiGRU 模型, 在该模型中, 利用RoBERTa 对健康数据进行向量化表示,将多头注意力机制与双向门控循环单元相结合抽取健康信息文本语义特征,并利用全连接与Softmax 函数对虚假健康信息进行识别。为验证RoBERTa-MHABiGRU模型的有效性,针对中英文数据集分别设计了3 部分实验:实验一表明,深度学习模型的识别效果优于机器学习模型,并且RoBERTa 的文本表示效果优于BERT;实验二表明,引入注意力机制有助于提升模型的学习能力,且添加多头注意力机制的RoBERTa-MHA-BiGRU 模型识别效果优于单头注意力机制;实验三表明,数据增强可进一步提升模型性能。理论上,本文拓展了虚假健康信息研究的深度和广度;实践上,为社交媒体虚假健康信息识别提供技术指导,有助于社交媒体用户及时规避虚假健康信息,提高虚假健康信息治理效率和效果。

关键词: 虚假健康信息, 社交媒体, 多头注意力机制, BiGRU, 数据增强

Abstract: Health misinformation on social media is intricate fast-spreading and highly harmful topublic health. Rapid and effective identification of such misinformation is thus of great importance. Inthis study we first gathered health information from various social media platforms to build a bilingualdataset. We then developed a RoBERTa-MHA-BiGRU model to identify health misinformation. In thismodel the RoBERTa a pre-trained language model was used to vectorize the health data combining amulti-head attention mechanism with a bidirectional gated recurrent units BiGRU to extract semanticfeatures from the texts. And a fully connected layer and the Softmax function were employed to identifyhealth misinformation. Finally three sets of experiments were conducted for the Chinese and Englishdatasets to validate the effectiveness of the RoBERTa-MHA-BiGRU model. Experiment 1 showed thatdeep learning models outperformed machine learning models and that RoBERTa's text representation wassuperior to BERT􀆳s. Experiment 2 demonstrated that incorporating an attention mechanism enhanced themodel's learning capabilities with the RoBERTa-MHA-BiGRU model outperforming the single-headattention model. Experiment 3 revealed that data augmentation further improved the model sperformance. In summary this paper expands the depth and breadth of theoretical research on healthmisinformation. Practically it provides technical guidance for the identification of health misinformationon social media helping social media users to avoid such misinformation promptly and improving the efficiency and effectiveness of health misinformation management.

Key words: Health misinformation, Social media, Multi-head attention mechanisms, BiGRU, Dataaugmentation