Libraly Journal

LIBRARY JOURNAL

Previous Articles     Next Articles

Title Information Classification Based on Hownet Semantics Feature Extension

Li Xiangdong, Liu Kang,Ding Cong, Liao Xiangpeng   

  • Online:2017-02-21 Published:2017-02-25

Abstract:

This paper uses the internal semantic relevance of the text and get the core semantic word set of the training text through high frequency words and the hidden theme. It then use the Hownet as an external resource to calculate the similarity between the core semantic word set and testing text. It extends
the keywords in training text, whose similarity is greater than a certain level, into the testing text, and classifies them with SVM. The result shows that in the case where training set and test set are only titles,and there are 200 pieces in each category of training set, there is an increase of efficiency to 3.1%; but the
efficiency declines with the increase of the number of training set text over 200. In the case where training sets are titles and abstracts whereas the testing sets are titles, the classification algorithm put forward in this paper could achieve 1.5% and 3.1% on Macro_F1in Fudan corpus and the self-builtjournal corpus, and 2.3% and 5.3% on Micro_F1. This paper aims to implement characteristic extension of journal titles with sparse characteristics in the hope of improving the work of title classification.

Key words: Journal title information, Short-text classification, Hownet, LDA