Conference Papers

Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7616

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    A Framework to Assess Multilingual Vulnerabilities of LLMs
    (Association for Computing Machinery, 2025-05-23) Tang L; Bogahawatta N; Ginige Y; Xu J; Sun S; Ranathunga S; Seneviratne S
    Large Language Models (LLMs) are acquiring a wider range of capabilities, including understanding and responding in multiple languages. While they undergo safety training to prevent them from answering illegal questions, imbalances in training data and human evaluation resources can make these models more susceptible to attacks in low-resource languages (LRL). This paper proposes a framework to automatically assess the multilingual vulnerabilities of commonly used LLMs. Using our framework, we evaluated six LLMs across eight languages representing varying levels of resource availability. We validated the assessments generated by our automated framework through human evaluation in two languages, demonstrating that the framework's results align with human judgments in most cases. Our findings reveal vulnerabilities in LRL; however, these may pose minimal risk as they often stem from the model's poor performance, resulting in incoherent responses.
  • Item
    Multi-lingual mathematical word problem generation using long short term memory networks with enhanced input features
    (European Language Resources Association (ELRA), 2020-01-01) Liyanage V; Ranathunga S
    A Mathematical Word Problem (MWP) differs from a general textual representation due to the fact that it is comprised of numerical quantities and units, in addition to text. Therefore, MWP generation should be carefully handled. When it comes to multi-lingual MWP generation, language specific morphological and syntactic features become additional constraints. Standard template-based MWP generation techniques are incapable of identifying these language specific constraints, particularly in morphologically rich yet low resource languages such as Sinhala and Tamil. This paper presents the use of a Long Short Term Memory (LSTM) network that is capable of generating elementary level MWPs, while satisfying the aforementioned constraints. Our approach feeds a combination of character embeddings, word embeddings, and Part of Speech (POS) tag embeddings to the LSTM, in which attention is provided for numerical values and units. We trained our model for three languages, English, Sinhala and Tamil using separate MWP datasets. Irrespective of the language and the type of the MWP, our model could generate accurate single sentenced and multi sentenced problems. Accuracy reported in terms of average BLEU score for English, Sinhala and Tamil languages were 22.97%, 24.49% and 20.74%, respectively.
  • Item
    Word embedding evaluation for Sinhala
    (European Language Resources Association, 2020-01-01) Lakmal D; Ranathunga S; Peramuna S; Herath I; Calzolari N; Béchet F; Blache P; Choukri K; Cieri C; Declerck T; Goggi S; Isahara H; Maegaard B; Mariani J; Mazo H; Moreno A; Odijk J; Piperidis S
    This paper presents the first ever comprehensive evaluation of different types of word embeddings for Sinhala language. Three standard word embedding models, namely, Word2Vec (both Skipgram and CBOW), FastText, and Glove are evaluated under two types of evaluation methods: intrinsic evaluation and extrinsic evaluation. Word analogy and word relatedness evaluations were performed in terms of intrinsic evaluation, while sentiment analysis and part-of-speech (POS) tagging were conducted as the extrinsic evaluation tasks. Benchmark datasets used for intrinsic evaluations were carefully crafted considering specific linguistic features of Sinhala. In general, FastText word embeddings with 300 dimensions reported the finest accuracies across all the evaluation tasks, while Glove reported the lowest results.
  • Item
    Dataset and Baseline for Automatic Student Feedback Analysis
    (European Language Resources Association (ELRA), 2022-01-01) Nilanga K; Herath M; Maduwantha H; Ranathunga S; Calzolari N; Béchet F; Blache P; Choukri K; Cieri C; Declerck T; Goggi S; Isahara H; Maegaard B; Mariani J; Mazo H; Odijk J; Piperidis S
    In this paper, we present a student feedback corpus that contains 3000 instances of feedback written by university students. This dataset has been annotated for aspect terms, opinion terms, polarities of the opinion terms towards targeted aspects, and document-level opinion polarities. We developed a hierarchical taxonomy for aspect categorisation, which covers many aspects of the teaching-learning process. We annotated both implicit and explicit aspects using this taxonomy. Annotation methodology, difficulties faced during the annotation, and the details of the aspect term categorization are discussed in detail. Using state-of-the-art techniques, we have built baseline models for the following tasks: Target oriented Opinion Extraction, Aspect Level Sentiment Analysis, and Document Level Sentiment Analysis. These models reported 64%, 75%, and 86% F1 scores (respectively) for the considered tasks. These results illustrate the reliability and usability of the corpus for different tasks related to sentiment analysis.