Incorporating Knowledge of Source Language Text in a System for Dictation of Document Translations

Aug 1, 2009·

Aarthi Reddy

Richard Rose

Hani Safadi

Samuel Larkin

Gilles Boulianne

· 0 min read

PDF Cite ACL Anthology

Abstract

This paper describes methods for integrating source language and target language information for machine aided human translation (MAHT) of text documents. These methods are applied to a language translation task involving a human translator dictating a first draft translation of a source language document. A method is presented which integrates target language automatic speech recognition (ASR) models with source language statistical machine translation (SMT) and named entity recognition (NER) information at the phonetic level. Information extracted from a source language document including translation model probabilities and translated named entities are combined with acoustic-phonetic information obtained from phone lattices produced by the ASR system. Phone-level integration allows the combined MAHT system to correctly decode words that are either not in the ASR vocabulary or would have been incorrectly decoded by the ASR system. It is shown that the combined MAHT system results in a decrease in word error rate on the dictated translations of 32% relative to a stand alone baseline ASR system.

Type

Conference paper

Publication

Proceedings of Machine Translation Summit XII: Papers

Last updated on Aug 1, 2009

MTSummit ASR

← Lessons from NRC′s Portage System at WMT 2010 Jul 1, 2010

PortageLive: delivering machine translation technology via virtualization Aug 1, 2009 →