Libraly Journal

Libraly Journal ›› 2025, Vol. 44 ›› Issue (408): 110-122.

Previous Articles     Next Articles

Discipline Classification of Humanities and Social Sciences Academic Papers Based on Large Language Models

Hu Die1, 2, Lin Litao3, Liu Liu1, 2, Shen Si4, Wang Dongbo1, 2 (1 College of Information Management, Nanjing Agricultural University; 2 Research Center for Humanities and Social Computing, Nanjing Agricultural University; 3 School of Information Management, Nanjing University; 4 School of Economics and Management, Nanjing University of Science and Technology)   

  • Online:2025-04-15 Published:2025-04-24
  • About author:Hu Die1, 2, Lin Litao3, Liu Liu1, 2, Shen Si4, Wang Dongbo1, 2 (1 College of Information Management, Nanjing Agricultural University; 2 Research Center for Humanities and Social Computing, Nanjing Agricultural University; 3 School of Information Management, Nanjing University; 4 School of Economics and Management, Nanjing University of Science and Technology)

Abstract:

The rapid growth of academic papers and the increasing degree of specialization in disciplinary fields pose higher demands on automatic classification methods. This paper investigates the applicability of large language models (LLMs) in classifying academic papers in the humanities and social sciences. It compares the performance of traditional machine learning models and LLMs (including Qwen-7B, Llama2-7B, Llama2-7B-hsse, and GPT4), through subject classification experiments. It further explores the performance of LLMs across different scales of labeled data. The study shows that the domain-specific large language model Llama2-7B-hsse exhibits a significant advantage with an overall classification F1-score of 89.22% across 21 categories, while requiring only one-fifth of the training data needed by the benchmark model SsciBERT. The findings highlight the effectiveness of domain incremental training and fine-tuning strategies based on large language models for enhancing automatic classification, especially in resource-limited scenarios, while providing new ideas for knowledge organization and interdisciplinary research.