Transferring markup tags in statistical machine translation: a two-stream approach

Sep 1, 2013·

Eric Joanis

Darlene Stewart

Samuel Larkin

Roland Kuhn

· 0 min read

Abstract

Translation agencies are introducing statistical machine translation (SMT) into the work flow of human translators. Typically, SMT produces a first-draft translation, which is then post-edited by a person. SMT has met much resistance from translators, partly because of professional conservatism, but partly because the SMT community has often neglected some practical aspects of translation. Our paper discusses one of these: transferring formatting tags such as bold or italicfrom the source to the target document with a low error rate, thus freeing the post-editor from having to reformat SMT-generated text. In our “two-stream”approach, tags are stripped from the input to the decoder, then reinserted into the resulting target-language text. Tag transfer has been tackled by other SMT teams, but only a few have published descriptions of their work. This paper contributes to understanding tag transfer by explaining our approach in detail.

Type

Conference paper

Publication

Proceedings of the 2nd Workshop on Post-editing Technology and Practice

Last updated on Sep 1, 2013

MTSummit SMT

← An example journal article Sep 1, 2015

An example conference paper Jul 1, 2013 →