Machine Translation Reference-less Evaluation using YiSi-2 with Bilingual Mappings of Massive Multilingual Language Model

Nov 1, 2020·

Chi-Kiu Lo

Samuel Larkin

· 0 min read

Abstract

We present a study on using YiSi-2 with massive multilingual pretrained language models for machine translation (MT) reference-less evaluation. Aiming at finding better semantic representation for semantic MT evaluation, we first test YiSi-2 with contextual embed- dings extracted from different layers of two different pretrained models, multilingual BERT and XLM-RoBERTa. We also experiment with learning bilingual mappings that trans- form the vector subspace of the source language to be closer to that of the target language in the pretrained model to obtain more accurate cross-lingual semantic similarity representations. Our results show that YiSi-2′s correlation with human direct assessment on translation quality is greatly improved by replacing multilingual BERT with XLM-RoBERTa and projecting the source embeddings into the tar- get embedding space using a cross-lingual lin- ear projection (CLP) matrix learnt from a small development set.

Type

Conference paper

Publication

Proceedings of the Fifth Conference on Machine Translation

Last updated on Nov 1, 2020

← NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task Jun 1, 2021

NRC Systems for Low Resource German-Upper Sorbian Machine Translation 2020: Transfer Learning with Lexical Modifications Nov 1, 2020 →