Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
Loading...
Date
2024-04-01
Open Access Location
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology.
Rights
(c) 2024 The Author/s
CC BY 4.0
CC BY 4.0
Abstract
Pre-trained multilingual language models (PMLMs) such as mBERT and XLM-R have shown good cross-lingual transferability. However, they are not specifically trained to capture cross-lingual signals concerning sentiment words. This poses a disadvantage for low-resource languages (LRLs) that are under-represented in these models. To better fine-tune these models for sentiment classification in LRLs, a novel intermediate task fine-tuning (ITFT) technique based on a sentiment lexicon of a high-resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3-class sentiment classification task and show that this method outperforms vanilla fine-tuning of the PMLM. It also outperforms or is on-par with basic ITFT that relies on an HRL sentiment classification dataset.
Description
Keywords
deep learning, natural languages, natural language processing
Citation
Dhananjaya V, Ranathunga S, Jayasena S. (2024). Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis. CAAI Transactions on Intelligence Technology. Early View.