Syntax-based word reordering in phrase-based statistical machine translation

why does it work?

Simon Zwarts, Mark Dras

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

Most natural language applications have some degree of preprocessing of data: tokenisation, stemming and so on. In the domain of Statistical Machine Translation (SMT) it has been shown that word reordering as a preprocessing step can help the translation process, but it is unclear why. We propose two possible reasons for the observed improvement: (1) that the reordering explicitly matches the syntax of the source language more closely to that of the target language; or (2) that it fits the data better to the mechanisms of phrasal SMT. In previous work from German to English, for example, hand-written language-specific reordering rules both match the German more closely to English syntax, and compress heads and dependants into the PSMT phrasal window. Whether the source of the improvement is (1) or (2) has not been determined, although most other work assumes the former. To identify the effects of each possible cause, we carry out two sets of experiments. For (1) we reverse the language-dependent syntactic reordering such that heads and dependants are moved apart. For (2), we propose a generic approach to minimising dependency distances in reordering that does not explicitly match target language word order and that does not require language-specific rules; the aimof which, rather than to beat state-of-the-art systems, is to investigate. The results show that (1) and (2) individually do still lead to improvements in translation quality, but each weaker than the original, suggesting that both features are necessary for a strong improvement. A consequence of this is that is possible to gain half the improvement of language-specific rules through one generic one.
Original languageEnglish
Title of host publicationMT Summit XI proceedings
EditorsBente Maegaard
Place of PublicationAllschwil, Switzerland
PublisherEuropean Association for Machine Translation
Pages559-566
Number of pages8
ISBN (Print)9788790708160
Publication statusPublished - 2007
EventMachine Translation Summit (11th : 2007) - Copenhagen, Denmark
Duration: 10 Sep 200714 Sep 2007

Conference

ConferenceMachine Translation Summit (11th : 2007)
CityCopenhagen, Denmark
Period10/09/0714/09/07

Fingerprint Dive into the research topics of 'Syntax-based word reordering in phrase-based statistical machine translation: why does it work?'. Together they form a unique fingerprint.

  • Cite this

    Zwarts, S., & Dras, M. (2007). Syntax-based word reordering in phrase-based statistical machine translation: why does it work? In B. Maegaard (Ed.), MT Summit XI proceedings (pp. 559-566). Allschwil, Switzerland: European Association for Machine Translation.