This phrase-based SMT system is out of order: generalised word reordering in machine translation

Simon Zwarts, Mark Dras

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Many natural language processes have some degree of preprocessing of data: tokenisation, stemming and so on. In the domain of Statistical Machine Translation it has been shown that word reordering as a preprocessing step can help the translation process. Recently, hand-written rules for reordering in German–English translation have shown good results, but this is clearly a labour-intensive and language pair-specific approach. Two possible sources of the observed improvement are that (1) the reordering explicitly matches the syntax of the source language more closely to that of the target language, or that (2) it fits the data better to the mechanisms of phrasal SMT; but it is not clear which. In this paper, we apply a general principle based on dependency distance minimisation to produce reorderings. Our languageindependent approach achieves half of the improvement of a reimplementation of the handcrafted approach, and suggests that reason (2) is a possible explanation for why that reordering approach works.
Original languageEnglish
Title of host publicationProceedings of the 2006 Australasian language technology workshop 2006, November 30-December 1, 2006, Sancta Sophia College, Sydney
EditorsLawrence Cavedon, Ingrid Zukerman
Place of PublicationCarlton, Vic
PublisherAustralasian Language Technology Association
Pages149-156
Number of pages8
ISBN (Print)9781741081466
Publication statusPublished - 2006
EventAustralasian Language Technology Association Workshop - Sydney
Duration: 30 Nov 20061 Dec 2006

Workshop

WorkshopAustralasian Language Technology Association Workshop
CitySydney
Period30/11/061/12/06

Fingerprint

Dive into the research topics of 'This phrase-based SMT system is out of order: generalised word reordering in machine translation'. Together they form a unique fingerprint.

Cite this