多维特征融合视角下虚假健康信息识别研究——以饮食健康信息为例

摘要/Abstract

摘要： 文章基于微信公众号健康类文本进行特征挖掘，探究如何有效提高虚假健康信息识别的准确性，提升网络平台的健康信息质量，为用户健康决策提供参考。从内容特征、情感特征、发布者特征和领域特征4个维度出发，提出一种基于多维特征融合和多层感知机（MDFF-MLP）的虚假健康信息识别方法。首先，分析文章内容并提取多维关键特征；其次，使用特征消融实验确定表意丰富、真假区分能力强的特征组合，优化饮食健康领域的情感词典，构建基于“内容情感发布者领域”多维度融合的多层感知机模型；最后，利用深度置信网络学习多维特征并进行虚假饮食健康信息分类。结果显示：在MDFF-MLP模型中，内容特征、发布者特征的各单维特征对文本的真假区分度是0.90、0.84，多维特征融合后准确率达到0.96，F1值达到0.95，相较于逻辑回归（F1=0.82）和LightGBM（F1=0.89）等基线模型取得较大提升，具有较高的实用性。

关键词: 特征融合, 虚假健康信息, 信息治理, 多层感知机, 深度学习

Abstract: Based on the health-related texts from WeChat public accounts, the study conducts feature mining to explore how to effectively improve the accuracy of false health information detection, enhance the quality of healthy information on online platforms, and provide reference for users to make health-related decisions.Starting with four dimensions—content features, emotional features, publisher features and domain features, this paper proposes a health misinformation recognition method based on multidimensional feature fusion and multi-layer perceptron method(MDFF-MLP). Firstly, the textual content is analyzed to extract multidimensional key features. Secondly, feature ablation experiments are conducted to determine feature combinations with rich meaning and strong ability to distinguish true and false information. The sentiment lexicon in the dietary health domain is optimized, and a multilayer perceptron model based on the multi-dimensional fusion of “content-emotion-publisher-domain” features is constructed. Finally, a deep belief network is used to learn multidimensional features and classify dietary health misinformation. The findings indicate that in the MDFF-MLP model, the discriminative power of single-dimensional features reaches 0.90 for content features and 0.84 for publisher features. After multidimensional feature fusion, the model achieves an accuracy of 0.96, with an F1 score of 0.95. Compared with baseline models such as Logistic regression(F1=0.82) and LightGBM(F1=0.89), this model demonstrates significant improvement and strong practical applicability.

Key words: Feature fusion, Health misinformation, Information governance, Multi-layer , perceptron, Deep learning

范静, 朱云琴, 余意. 多维特征融合视角下虚假健康信息识别研究——以饮食健康信息为例[J]. 图书馆杂志, 2026, 45(4): 108-121.

Fan Jing, Zhu Yunqin, Yu Yi. A Study on Health Misinformation Detection from a Multidimensional Feature Fusion Perspective： A Case Study of Dietary Health Information[J]. Libraly Journal, 2026, 45(4): 108-121.

参考文献

［1］中国互联网络信息中心.第53次《中国互联网络发展状况统计报告》［EB/OL］.［2024-03-22］. https://www.cnnic.net.cn/n4/2024/0322/c8810964.html.
［2］李月琳,张秀,王姗姗.社交媒体健康信息质量研究:基于真伪健康信息特征的分析［J］.情报学报,2018, 37(3):294304.
［3］国务院公报.《“健康中国2030”规划纲要》［EB/OL］.［2016-10-25］. https://www.gov.cn/gongbao/content/2016/content_5133024.htm.
［4］王二朋,高志峰.食品质量属性及其消费偏好的研究综述与展望［J］.世界农业,2020(7):1724.
［5］ Chou W Y S, Oh A, Klein W M P. Addressing healthrelated misinformation on social media［J］. Jama, 2018, 320(23):24172418.
［6］邓胜利,顾一飞.网络虚假健康信息研究综述：认知、行为与治理［J］.图书馆杂志,2022, 41(5):1422.
［7］ Liu Y, Yu K, Wu X, et al.Analysis and detection of healthrelated misinformation on Chinese social media［J］. IEEE Access, 2019, 7:154480154489.
［8］ Ghenai A. Health misinformation in search and social media［C］//Proceedings of the 2017 International Conference on Digital Health. 2017:235236.
［9］张帅.社交媒体虚假健康信息特征识别［J］.图书情报工作, 2021, 65(9):7078.
［10］ Winker M A, Flanagin A, ChiLum B, et al. Guidelines for medical and health information sites on the internet: principles governing AMA web sites［J］. Jama, 2000, 283(12):16001606.
［11］ Wu K, Yang S, Zhu K Q. False rumors detection onsina weibo by propagation structures［C］//2015 IEEE 31st International Conference on Data Engineering. IEEE, 2015:651662.
［12］ Luca M,Zervas G. Fake it till you make it: reputation, competition, and Yelp review fraud［J］. Management Science, 2016, 62(12):34123427.
［13］孔杉杉,张军,冯立超.基于计算叙事模型的虚假健康信息特征研究［J］.情报杂志,2024, 43(2):152161.
［14］ Lavorgna L, De Stefano M, Sparaco M, et al. Fake news, influencers and health-related professional participation on the web: a pilot study on a social-network of people with Multiple Sclerosis［J］. Multiple Sclerosis and Related Disorders, 2018, 25:175178.
［15］ Panatto D, Amicizia D, Arata L, et al. A comprehensive analysis of Italian web pages mentioning squalene-based influenza vaccine adjuvants reveals a high prevalence of misinformation［J］. Human Vaccines & Immunotherapeutics, 2018, 14(4): 969977.
［16］ Safarnejad L, Xu Q, Ge Y, et al. A multiple feature category data mining and machine learning approach to characterize and detect health misinformation on social media［J］. IEEE Internet Computing, 2021, 25(5):4351.
［17］ Hou M W, Wei R, Lu L, et al. Research review of knowledge graph and its application in medical domain［J］. Journal of Computer Research and Development, 2018, 55(12):25872599.
［18］ Massey P M, Kearney M D,Hauer M K, et al. Dimensions of misinformation about the HPV vaccine on Instagram: content and network analysis of social media characteristics［J］. Journal of Medical Internet Research, 2020, 22(12):e21451.
［19］詹骞,赵冰洁.健康类虚假信息的人工神经网络识别与治理［J］.现代传播:中国传媒大学学报, 2022, 44(8):155161.
［20］赵月华,朱思成,苏新宁.面向网络虚假医疗信息的识别模型构建研究 —— 一种基于预训练的BERT模型［J］.情报科学, 2021, 39(12):165173.
［21］ Ghenai A, Mejova Y. Fake cures: user-centric modeling of health misinformation in social media［J］. Proceedings of the ACM on Human-Computer Interaction, 2018, 2(CSCW):120.
［22］ Shu K, Sliva A, Wang S, et al. Fake news detection on social media: a data mining perspective［J］. ACM SIGKDD Explorations Newsletter, 2017, 19(1):2236.
［23］张志勇,荆军昌,李斐,等.人工智能视角下的在线社交网络虚假信息检测、传播与控制研究综述［J］.计算机学报,2021, 44(11):22612282.
［24］ Kaliyar R K, Goswami A, Narang P. FakeBERT: fake news detection in social media with a BERTbased deep learning approach［J］. Multimedia Tools and Applications, 2021, 80(8):1176511788.
［25］金燕,徐何贤,毕崇武.多维特征融合的虚假健康信息识别方法研究:基于LightGBM算法［J］.情报理论与实践, 2023, 46(8):156164.
［26］ Zhang X, Cao J, Li X, et al. Mining dual emotion for fake news detection［C］//Proceedings of the Web Conference 2021. 2021:34653476.
［27］ Barbado R, Araque O, Iglesias C A. A framework for fake review detection in online consumer electronics retailers［J］. Information Processing & Management, 2019, 56(4): 12341244.
［28］余秋文.微博信息可信度评价指标体系［D］.武汉：华中师范大学,2014.
［29］ Kang Z, John Y, Greta G, et al. Finding influential users of an online health community: a new metric based on sentiment influence［J］.Journal of the American Medical Informatics Association, 2014, 21(e2):212218.
［30］清华大学自然语言处理与社会人文计算实验室.THUOCL（THU Open Chinese Lexicon）中文词库［EB/OL］. ［20170420］. http://thuocl.thunlp.org/.
［31］ Al Bataineh A, Kaur D, Jalali S M J. Multilayer perceptron training optimization using nature inspired computing［J］. IEEE Access, 2022, 10:3696336977.
［32］马璇. 基于多层感知机的大场景点云分类方法研究［D］.南京：南京邮电大学，2023.
［33］李文乾,吴云桓,吴兢业,等.基于多层感知机技术的地铁盾构施工参数预测［J］.深圳大学学报(理工版), 2024, 41(1):5057.
［34］ Ramchoun H, Ghanou Y, Ettaouil M, et al. Multilayer perceptron: architecture optimization and training［J］. International Journal of Interactive Multimedia and Artificial Intelligence, 2016, 4(1):2630.
［35］ Finner H, Gontscharuk V. Twosample KolmogorovSmirnovtype tests revisited: old and new tests in terms of local levels［J］. The Annals of Statistics, 2018, 46(6A): 30143037.
［36］付少雄,宋金铃,邓胜利,等.虚假短视频多模态内容语义操纵对用户信任的影响研究［J/OL］.图书馆杂志,117［2024-12-25］. http://kns.cnki.net/kcms/detail/31.1108.g2.20241008.1546.009.html.

[1]	汤　旎. 循证视角下基于深度学习的外文学术图书采访决策模型研究与实证[J]. 图书馆杂志, 2026, 45(1): 40-48.
[2]	陈明红　何嘉宁(中山大学信息管理学院). 基于RoBERTa-MHA-BiGRU 的社交媒体虚假健康信息识别研究[J]. 图书馆杂志, 2025, 44(416): 81-92.
[3]	赵京胜1, 2 　杨心怡1 　曲维龙1 　郑嘉上1 　朱巧明2(1 青岛理工大学信息与控制工程学院　2 苏州大学计算机科学与技术学院). 双通道多粒度特征融合的古诗词命名实体识别——以唐宋时期为例[J]. 图书馆杂志, 2025, 44(415): 75-87.
[4]	崔金英颜佳（上海图书馆）. 基于深度学习模型的命名实体识别对比研究——以民国电影类期刊为例[J]. 图书馆杂志, 2025, 44(405): 108-119.
[5]	付少雄曾源来孙岚邓胜利（南京农业大学信息管理学院武汉大学信息管理学院）. 组态视角下短视频虚假健康信息传播行为的影响因素研究[J]. 图书馆杂志, 2024, 43(404): 103-116.
[6]	张晓芳（湘潭大学公共管理学院）黄嘉欣（澳门大学协同创新研究院）. 图书馆参与信息迷雾治理的实践框架及整体性思考[J]. 图书馆杂志, 2024, 43(402): 97-107.
[7]	张雨卉（上海图书馆）. 基于《中国图书馆分类法》的文献自动化深层分类的研究和实现[J]. 图书馆杂志, 2024, 43(395): 61-74.
[8]	刘懋霖赵萌王昊（南京大学信息管理学院江苏省数据工程与知识服务重点实验室）. 面向古诗词的物象库构建方法及其分布规律研究[J]. 图书馆杂志, 2024, 43(393): 96-108.
[9]	李新月练靖雯刘周颖朱庆华（南京大学信息管理学院南京林业大学人文社会科学学院）. 健康信息治理新路径：失真健康信息预先干预的实践案例分析与经验启示[J]. 图书馆杂志, 2023, 42(392): 120-132.
[10]	余馨玲常娥（东南大学经济管理学院东南大学图书馆）. 基于DA-BERT-CRF 模型的古诗词地名自动识别研究——以金陵古诗词为例[J]. 图书馆杂志, 2023, 42(390): 87-94.
[11]	沈立力姜鹏王静（上海图书馆）. 基于 BERT 模型的中文期刊文献自动分类实践研究 [J]. 图书馆杂志, 2022, 41(5): 109-118.
[12]	邓胜利顾一飞（武汉大学信息资源研究中心）. 网络虚假健康信息研究综述：认知、行为与治理 [J]. 图书馆杂志, 2022, 41(5): 14-22.
[13]	鲍宸洋任明（中国人民大学信息资源管理学院）. 基于Bootstrapping的家谱文本信息抽取方法研究[J]. 图书馆杂志, 2022, 41(2): 93-102.
[14]	牛悦李辉刘钊. 基于深度学习的行人检测方法在图书馆中的应用研究[J]. 图书馆杂志, 2021, 40(9): 62-69.
[15]	冯佳穆晓敏王伟. 面向研究前沿识别的载体-特征-关系融合模型研究[J]. 图书馆杂志, 2020, 39(9): 56-63.