Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis

dc.citation.volumeEarly View
dc.contributor.authorDhananjaya V
dc.contributor.authorRanathunga S
dc.contributor.authorJayasena S
dc.date.accessioned2024-10-23T02:03:33Z
dc.date.available2024-10-23T02:03:33Z
dc.date.issued2024-04-01
dc.description.abstractPre-trained multilingual language models (PMLMs) such as mBERT and XLM-R have shown good cross-lingual transferability. However, they are not specifically trained to capture cross-lingual signals concerning sentiment words. This poses a disadvantage for low-resource languages (LRLs) that are under-represented in these models. To better fine-tune these models for sentiment classification in LRLs, a novel intermediate task fine-tuning (ITFT) technique based on a sentiment lexicon of a high-resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3-class sentiment classification task and show that this method outperforms vanilla fine-tuning of the PMLM. It also outperforms or is on-par with basic ITFT that relies on an HRL sentiment classification dataset.
dc.description.confidentialfalse
dc.edition.edition2024
dc.identifier.citationDhananjaya V, Ranathunga S, Jayasena S. (2024). Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis. CAAI Transactions on Intelligence Technology. Early View.
dc.identifier.doi10.1049/cit2.12333
dc.identifier.eissn2468-2322
dc.identifier.elements-typejournal-article
dc.identifier.issn2468-6557
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/71828
dc.languageEnglish
dc.publisherJohn Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology.
dc.publisher.urihttps://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.12333
dc.relation.isPartOfCAAI Transactions on Intelligence Technology
dc.rights(c) 2024 The Author/s
dc.rightsCC BY 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectdeep learning
dc.subjectnatural languages
dc.subjectnatural language processing
dc.titleLexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
dc.typeJournal article
pubs.elements-id488633
pubs.organisational-groupOther
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Published version.pdf
Size:
817.65 KB
Format:
Adobe Portable Document Format
Description:
488633 PDF.pdf
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description:
Collections