Use of prompt-based learning for code-mixed and code-switched text classification

Loading...
Thumbnail Image

Date

2024-09-09

DOI

Open Access Location

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Nature

Rights

(c) 2024 The Author/s
CC BY 4.0

Abstract

Code-mixing and code-switching (CMCS) are prevalent phenomena observed in social media conversations and various other modes of communication. When developing applications such as sentiment analysers and hate-speech detectors that operate on this social media data, CMCS text poses challenges. Recent studies have demonstrated that prompt-based learning of pre-trained language models outperforms full fine-tuning across various tasks. Despite the growing interest in classifying CMCS text, the effectiveness of prompt-based learning for the task remains unexplored. This paper presents an extensive exploration of prompt-based learning for CMCS text classification and the first comprehensive analysis of the impact of the script on classifying CMCS text. Our study reveals that the performance in classifying CMCS text is significantly influenced by the inclusion of multiple scripts and the intensity of code-mixing. In response, we introduce a novel method, Dynamic+AdapterPrompt, which employs distinct models for each script, integrated with adapters. While DynamicPrompt captures the script-specific representation of the text, AdapterPrompt emphasizes capturing the task-oriented functionality. Our experiments on Sinhala-English, Kannada-English, and Hindi-English datasets for sentiment classification, hate-speech detection, and humour detection tasks show that our method outperforms strong fine-tuning baselines and basic prompting strategies.

Description

Keywords

Code-mixing, Code-switching, Prompt-based learning, Pre-trained language models, XLM-R, Text classification, Language script, Sinhala, Kannada, Hindi

Citation

Udawatta P, Udayangana I, Gamage C, Shekhar R, Ranathunga S. (2024). Use of prompt-based learning for code-mixed and code-switched text classification. World Wide Web. 27. 5.

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as (c) 2024 The Author/s