Data Augmentation with Diversified Rephrasing for Low-Resource Neural Machine Translation

dc.citation.volume1
dc.contributor.authorGao Y
dc.contributor.authorHou F
dc.contributor.authorJahnke H
dc.contributor.authorWang R
dc.contributor.editorUtiyama M
dc.contributor.editorWang R
dc.coverage.spatialMacau SAR, China
dc.date.accessioned2025-05-20T20:17:08Z
dc.date.available2025-05-20T20:17:08Z
dc.date.finish-date2023-09-08
dc.date.issued2023-01-01
dc.date.start-date2023-09-04
dc.description.abstractData augmentation is an effective way to enhance the performance of neural machine translation models, especially for low-resource languages. Existing data augmentation methods are either at a token level or a sentence level. The data augmented using token level methods lack syntactic diversity and may alter original meanings. Sentence level methods usually generate low-quality source sentences that are not semantically paired with the original target sentences. In this paper, we propose a novel data augmentation method to generate diverse, high-quality and meaning-preserved new instances. Our method leverages high-quality translation models trained with high-resource languages to rephrase an original sentence by translating it into an intermediate language and then back to the original language. Through this process, the high-performing translation models guarantee the quality of the rephrased sentences, and the syntactic knowledge from the intermediate language can bring syntactic diversity to the rephrased sentences. Experimental results show our method can enhance the performance in various low-resource machine translation tasks. Moreover, by combining our method with other techniques that facilitate NMT, we can yield even better results.
dc.description.confidentialfalse
dc.format.pagination35-47
dc.identifier.citationGao Y, Hou F, Jahnke H, Wang R. (2023). Data Augmentation with Diversified Rephrasing for Low-Resource Neural Machine Translation. Utiyama M, Wang R. MT Summit 2023 - Proceedings of 19th Machine Translation Summit. (pp. 35-47). Asia-Pacific Association for Machine Translation.
dc.identifier.elements-typec-conference-paper-in-proceedings
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/72918
dc.publisherAsia-Pacific Association for Machine Translation
dc.publisher.urihttp://aclanthology.org/2023.mtsummit-research.4.pdf
dc.rights(c) 2023 The Author/s
dc.rightsCC BY 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.journalMT Summit 2023 - Proceedings of 19th Machine Translation Summit
dc.source.name-of-conferenceMachine Translation Summit XIX
dc.titleData Augmentation with Diversified Rephrasing for Low-Resource Neural Machine Translation
dc.typeconference
pubs.elements-id487037
pubs.organisational-groupOther
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
487037 PDF.pdf
Size:
289.25 KB
Format:
Adobe Portable Document Format
Description:
Published version.pdf
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description: