Libraly Journal

Libraly Journal ›› 2022, Vol. 41 ›› Issue (2): 93-102.

Previous Articles     Next Articles

A Bootstrapping-based Information Extraction Method for Genealogy Text

Bao Chenyang, Ren Ming (School of Information Resource Management, Renmin University of China)   

  • Online:2022-02-15 Published:2022-02-23
  • About author:Bao Chenyang, Ren Ming (School of Information Resource Management, Renmin University of China)

Abstract: Automatic information extraction from genealogical text is the key to exploiting genealogy resources efficiently. Recently, deep learning has achieved remarkable success in information extraction from genealogy text, but has been limited by a lack of labeled data in this field. This paper aims at developing a bootstrapping-based method targeting at small-scale labeled genealogy text, which extracts information from biographies of family members. To be specific, the method starts with a small-scale labeled data and uses the BiLSTM-CRF model to predict label sequence, with those samples with the highest confidence scores chosen and added to the labeled data. In this way, the labeled data is incrementally expanded and the trained model can predict label sequence for given genealogy text, which is further used to derive entities and relationships. According to the experiment on real dataset, the proposed method can extract the information from digital genealogy text based on a small scale of labeled data, which makes deep learning methods more effective and practical for information extraction from genealogy records. The proposed method with a size of 250 achieves similar performance to that of the BiLSTM-CRF model with 1800 labeled data.