Incorporating Knowledge of Source Language Text in a System for Dictation of Document Translations

Aug 1, 2009·
Aarthi Reddy
,
Richard Rose
,
Hani Safadi
,
Samuel Larkin
,
Gilles Boulianne
· 0 min read
Abstract
This paper describes methods for integrating source language and target language information for machine aided human translation (MAHT) of text documents. These methods are applied to a language translation task involving a human translator dictating a first draft translation of a source language document. A method is presented which integrates target language automatic speech recognition (ASR) models with source language statistical machine translation (SMT) and named entity recognition (NER) information at the phonetic level. Information extracted from a source language document including translation model probabilities and translated named entities are combined with acoustic-phonetic information obtained from phone lattices produced by the ASR system. Phone-level integration allows the combined MAHT system to correctly decode words that are either not in the ASR vocabulary or would have been incorrectly decoded by the ASR system. It is shown that the combined MAHT system results in a decrease in word error rate on the dictated translations of 32% relative to a stand alone baseline ASR system.
Type
Publication
Proceedings of Machine Translation Summit XII: Papers