Abstract
Many natural language processes have some degree of preprocessing of data: tokenisation, stemming and so on. In the domain of Statistical Machine Translation it has been shown that word reordering as a preprocessing step can help the translation process. Recently, hand-written rules for reordering in German–English translation have shown good results, but this is clearly a labour-intensive and language pair-specific approach. Two possible sources of the observed improvement are that (1) the reordering explicitly matches the syntax of the source language more closely to that of the target language, or that (2) it fits the data better to the mechanisms of phrasal SMT; but it is not clear which. In this paper, we apply a general principle based on dependency distance minimisation to produce reorderings. Our languageindependent approach achieves half of the improvement of a reimplementation of the handcrafted approach, and suggests that reason (2) is a possible explanation for why that reordering approach works.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2006 Australasian language technology workshop 2006, November 30-December 1, 2006, Sancta Sophia College, Sydney |
Editors | Lawrence Cavedon, Ingrid Zukerman |
Place of Publication | Carlton, Vic |
Publisher | Australasian Language Technology Association |
Pages | 149-156 |
Number of pages | 8 |
ISBN (Print) | 9781741081466 |
Publication status | Published - 2006 |
Event | Australasian Language Technology Association Workshop - Sydney Duration: 30 Nov 2006 → 1 Dec 2006 |
Workshop
Workshop | Australasian Language Technology Association Workshop |
---|---|
City | Sydney |
Period | 30/11/06 → 1/12/06 |