Word embedding evaluation for Sinhala

Loading...
Thumbnail Image

Date

2020-01-01

DOI

Open Access Location

Journal Title

Journal ISSN

Volume Title

Publisher

European Language Resources Association

Rights

© European Language Resources Association (ELRA)
CC-BY-NC

Abstract

This paper presents the first ever comprehensive evaluation of different types of word embeddings for Sinhala language. Three standard word embedding models, namely, Word2Vec (both Skipgram and CBOW), FastText, and Glove are evaluated under two types of evaluation methods: intrinsic evaluation and extrinsic evaluation. Word analogy and word relatedness evaluations were performed in terms of intrinsic evaluation, while sentiment analysis and part-of-speech (POS) tagging were conducted as the extrinsic evaluation tasks. Benchmark datasets used for intrinsic evaluations were carefully crafted considering specific linguistic features of Sinhala. In general, FastText word embeddings with 300 dimensions reported the finest accuracies across all the evaluation tasks, while Glove reported the lowest results.

Description

Keywords

Word Embedding, Sinhala, Evaluation Methodologies

Citation

Lakmal D, Ranathunga S, Peramuna S, Herath I. (2020). Word embedding evaluation for Sinhala. Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, Goggi S, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S. Lrec 2020 12th International Conference on Language Resources and Evaluation Conference Proceedings. (pp. 1874-1881). European Language Resources Association.

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as © European Language Resources Association (ELRA)