Semi-Supervised Discriminative Transfer Learning in Cross-Language Text Classification

Document Type

Conference Proceeding

Publication Date

12-16-2019

Publication Title

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)

Publisher

Institute of Electrical and Electronics Engineers

Publisher Location

Boca Raton, FL

First page number:

1031

Last page number:

1038

Abstract

Cross-Language Text Classification (CLTC) has been increasing its attention to multilingual data due to its exponentially growing. CLTC aims to classify text documents in a label-scarce language, by leveraging classification information in a label-rich language. We propose a novel semi-supervised Discriminative Transfer Learning method (DTL) for the CLTC problem of a semi-supervised setting. A small number of paired labeled data in bilingual documents constructs a discriminative transfer model that maximizes the correlations of the documents in both languages, while a large number of unlabeled data are used for accurate data reconstruction. The discriminative transfer model minimizes the discrepancy between bilingual subspaces prioritizing discriminative features to improve the text classification performance without an automatic machine translation that most state-of-the-art methods require. The performance of DTL is empirically and statistically assessed by intensive experiments with the publicly available data, Reuters RCV1/RCV2 collections. The experimental results demonstrate that DTL outperforms several representative state-of-the-art methods in CLTC in terms of accuracy and efficiency.

Keywords

Cross-language text classificaiton; Semi supervised learning; Subspace learning

Disciplines

Computer Sciences

Language

English

UNLV article access

Share

COinS