Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis

Pre-trained multilingual language models (PMLMs) such as mBERT and XLM-R have shown good cross-lingual transferability. However, they are not specifically trained to capture cross-lingual signals concerning sentiment words. This poses a disadvantage for low-resource languages (LRLs) that are under-represented in these models. To better fine-tune these models for sentiment classification in LRLs, a novel intermediate task fine-tuning (ITFT) technique based on a sentiment lexicon of a high-resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3-class sentiment classification task and show that this method outperforms vanilla fine-tuning of the PMLM. It also outperforms or is on-par with basic ITFT that relies on an HRL sentiment classification dataset.

Keywords

deep learning, natural languages, natural language processing

Citation

Dhananjaya V, Ranathunga S, Jayasena S. (2024). Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis. CAAI Transactions on Intelligence Technology. Early View.

URI

https://mro.massey.ac.nz/handle/10179/71828

Collections

Journal Articles

Creative Commons license

Full item page

Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis

Files

Date

DOI

Open Access Location

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Rights

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license