Use of prompt-based learning for code-mixed and code-switched text classification

Udawatta P; Udayangana I; Gamage C; Shekhar R; Ranathunga S

Use of prompt-based learning for code-mixed and code-switched text classification

dc.citation.issue	5
dc.citation.volume	27
dc.contributor.author	Udawatta P
dc.contributor.author	Udayangana I
dc.contributor.author	Gamage C
dc.contributor.author	Shekhar R
dc.contributor.author	Ranathunga S
dc.date.accessioned	2024-10-09T00:40:20Z
dc.date.available	2024-10-09T00:40:20Z
dc.date.issued	2024-09-09
dc.description.abstract	Code-mixing and code-switching (CMCS) are prevalent phenomena observed in social media conversations and various other modes of communication. When developing applications such as sentiment analysers and hate-speech detectors that operate on this social media data, CMCS text poses challenges. Recent studies have demonstrated that prompt-based learning of pre-trained language models outperforms full fine-tuning across various tasks. Despite the growing interest in classifying CMCS text, the effectiveness of prompt-based learning for the task remains unexplored. This paper presents an extensive exploration of prompt-based learning for CMCS text classification and the first comprehensive analysis of the impact of the script on classifying CMCS text. Our study reveals that the performance in classifying CMCS text is significantly influenced by the inclusion of multiple scripts and the intensity of code-mixing. In response, we introduce a novel method, Dynamic+AdapterPrompt, which employs distinct models for each script, integrated with adapters. While DynamicPrompt captures the script-specific representation of the text, AdapterPrompt emphasizes capturing the task-oriented functionality. Our experiments on Sinhala-English, Kannada-English, and Hindi-English datasets for sentiment classification, hate-speech detection, and humour detection tasks show that our method outperforms strong fine-tuning baselines and basic prompting strategies.
dc.description.confidential	false
dc.identifier.citation	Udawatta P, Udayangana I, Gamage C, Shekhar R, Ranathunga S. (2024). Use of prompt-based learning for code-mixed and code-switched text classification. World Wide Web. 27. 5.
dc.identifier.doi	10.1007/s11280-024-01302-2
dc.identifier.eissn	1573-1413
dc.identifier.elements-type	journal-article
dc.identifier.issn	1386-145X
dc.identifier.number	63
dc.identifier.uri	https://mro.massey.ac.nz/handle/10179/71646
dc.language	English
dc.publisher	Springer Nature
dc.publisher.uri	https://link.springer.com/article/10.1007/s11280-024-01302-2
dc.relation.isPartOf	World Wide Web
dc.rights	(c) 2024 The Author/s
dc.rights	CC BY 4.0
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Code-mixing
dc.subject	Code-switching
dc.subject	Prompt-based learning
dc.subject	Pre-trained language models
dc.subject	XLM-R
dc.subject	Text classification
dc.subject	Language script
dc.subject	Sinhala
dc.subject	Kannada
dc.subject	Hindi
dc.title	Use of prompt-based learning for code-mixed and code-switched text classification
dc.type	Journal article
pubs.elements-id	491593
pubs.organisational-group	Other

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 491593 PDF.pdf
Size:: 1.91 MB
Format:: Adobe Portable Document Format
Description:: Published version.pdf

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 9.22 KB
Format:: Plain Text
Description:

Download

Collections

Journal Articles