Transferring markup tags in statistical machine translation: a two-stream approach
Sep 1, 2013·,,,·
0 min read
Eric Joanis
Darlene Stewart
Samuel Larkin
Roland Kuhn
Abstract
Translation agencies are introducing statistical machine translation (SMT) into the work flow of human translators. Typically, SMT produces a first-draft translation, which is then post-edited by a person. SMT has met much resistance from translators, partly because of professional conservatism, but partly because the SMT community has often neglected some practical aspects of translation. Our paper discusses one of these: transferring formatting tags such as bold or italicfrom the source to the target document with a low error rate, thus freeing the post-editor from having to reformat SMT-generated text. In our “two-stream”approach, tags are stripped from the input to the decoder, then reinserted into the resulting target-language text. Tag transfer has been tackled by other SMT teams, but only a few have published descriptions of their work. This paper contributes to understanding tag transfer by explaining our approach in detail.
Type
Publication
Proceedings of the 2nd Workshop on Post-editing Technology and Practice