Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis

Dhananjaya VRanathunga SJayasena S2024-10-232024-10-232024-04-01Dhananjaya V, Ranathunga S, Jayasena S. (2024). Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis. CAAI Transactions on Intelligence Technology. Early View.2468-6557https://mro.massey.ac.nz/handle/10179/71828Pre-trained multilingual language models (PMLMs) such as mBERT and XLM-R have shown good cross-lingual transferability. However, they are not specifically trained to capture cross-lingual signals concerning sentiment words. This poses a disadvantage for low-resource languages (LRLs) that are under-represented in these models. To better fine-tune these models for sentiment classification in LRLs, a novel intermediate task fine-tuning (ITFT) technique based on a sentiment lexicon of a high-resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3-class sentiment classification task and show that this method outperforms vanilla fine-tuning of the PMLM. It also outperforms or is on-par with basic ITFT that relies on an HRL sentiment classification dataset.(c) 2024 The Author/sCC BY 4.0https://creativecommons.org/licenses/by/4.0/deep learningnatural languagesnatural language processingLexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysisJournal article10.1049/cit2.123332468-2322journal-article