Manageable Phrase-based Statistical Machine Translation Models with Pseudo-code and Proofs

Jan 1, 2007·

Ghada Badr

Eric Joanis

Samuel Larkin

· 0 min read

Abstract

Statistical Machine Translation (SMT) is an evolving field where many techniques in Syntactic Pattern Recognition (SPR) are needed and applied. A typical phrase-based SMT system for translating from a T (target) language to an S (source) language contains one or more n-gram language models (LMs) and one or more phrase translation models (TMs). These LMs and TMs have a large memory footprint (up to several gigabytes). This paper describes novel techniques for filtering these models that ensure only relevant patterns in the LMs and TMs are loaded during translation. In experiments on a large Chinese-English task, these techniques yielded significant reductions in the amount of information loaded during translation: up to 58% reduction for LMs, and up to 75% for TMs.

Type

Manuscript

Last updated on Jan 1, 2007

← NRC`s PORTAGE System for WMT 2007 Jun 1, 2007

PORTAGE: with Smoothed Phrase Tables and Segment Choice Models Jun 1, 2006 →