图书馆杂志

图书馆杂志 ›› 2026, Vol. 45 ›› Issue (5): 37-47.

• 工作研究 • 上一篇    下一篇

多模型多视角下AI生成与学者撰写文献内容的比较研究

张强,高颖,辛竹琳,任豆豆,周洪   

  • 出版日期:2026-05-15 发布日期:2026-05-27
  • 作者简介:张强 淮阴师范学院文学院,讲师;南京农业大学人文与社会发展学院,博士后。研究方向:科技情报、数字人文。作者贡献:提出研究思路、论文撰写。E-mail: zhangqiang_dh@163.com 江苏淮安 223300
    高颖  南京农业大学人文与社会发展学院,博士研究生。研究方向:科技情报、数字人文。作者贡献:论文撰写与修改。江苏南京 210095
    辛竹琳  中国科学院大学武汉文献情报中心,博士研究生。研究方向:科技情报。作者贡献:论文撰写与修改。湖北武汉 430071
    任豆豆 新疆大学计算机科学与技术学院,硕士研究生。研究方向:深度学习、自然语言处理。作者贡献:进行实验。新疆乌鲁木齐 830017
    周洪 中国科学院大学武汉文献情报中心,副研究员。研究方向:科技情报。作者贡献:设计研究方案、论文修改。湖北武汉 430071

 A Comparative Study of AI-Generated and Scholar-Written Academic Literature from Multiple Models and Perspectives

Zhang Qiang, Gao Ying, Xin Zhulin, Ren Doudou, Zhou Hong   

  • Online:2026-05-15 Published:2026-05-27
  • About author:Zhang Qiang, Gao Ying, Xin Zhulin, Ren Doudou, Zhou Hong

摘要: 本研究通过对比分析AI生成与学者撰写的档案学期刊文献内容,深入探讨了AI技术在学术写作中的应用潜力及其相较于学者创作的相对优势与局限性。研究选取了近3年档案学领域核心期刊中的100篇高被引论文,提取其摘要、引言和结论部分,并利用6种大语言模型生成相应摘要。通过语义相似度、主题模型、分类检测、ROUGE评测及学术出版检测等多个维度进行系统分析。研究结果表明,6种模型生成的摘要与学者撰写的摘要具有高度相似性,其中通义千问在主题提炼方面表现尤为突出,其生成内容更贴近学者的专业深度。在分类检测方面,随机森林(RF)与Xgboost模型展现出优异的性能。ROUGE评测结果显示,大模型生成的摘要质量已达到甚至超越传统算法水平,文心一言4.0在此项评测中表现尤为显著。在学术查重测试中,GPT4.0与通义千问均符合规范,尤其是通义千问在知网AIGC检测中表现出极低的疑似AI生成比例。基于上述发现,本研究建议学术出版平台在学术不端检测方面需进一步适应新型文本生成技术的发展,细化AIGC检测标准,加强跨平台合作与数据共享,并特别关注通义千问模型带来的检测挑战。

关键词: 大语言模型, AIGC检测, 学术写作, 文本评测

Abstract: This study compares and analyzes the content of archival journal articles generated by AI with those written by scholars, delving into the potential of AI technology in academic writing and its relative advantages and limitations compared to human-authored works. The research selected 100 highly cited papers from core journals in the field of archival studies over the past three years. The abstracts, introductions, and conclusions were extracted, and corresponding abstracts were generated using six large language models. A systematic analysis was conducted across multiple dimensions, including semantic similarity, topic modeling, classification detection, ROUGE evaluation, and academic publication detection. The results indicate that the abstracts generated by the six models exhibit a high degree of similarity to those written by scholars, with Tongyi Qianwen particularly excelling in topic refinement, producing content that closely aligns with the professional depth of scholars. In terms of classification detection, the Random Forest (RF) and XGBoost models demonstrated outstanding performance. ROUGE evaluation results show that the quality of abstracts generated by large models has reached or even surpassed traditional algorithm levels, with Wenxin Yiyan 4.0 performing especially well. In academic plagiarism detection tests, both GPT4.0 and Tongyi Qianwen met the standards. Tongyi Qianwen shows an extremely low proportion of suspected AI-generated content in the CNKI AIGC detection. Based on these findings, this study recommends that academic publishing platforms further adapt to the development of new text generation technologies, refine AIGC detection standards, enhance crossplatform collaboration and data sharing, and pay special attention to the detection challenges posed by the Tongyi Qianwen model.

Key words: Large language model, AIGC detection, Academic writing, Text evaluation