Abstract
In this paper we describe a six-ways parallel public-domain corpus consisting of 2100 United Nations General Assembly Resolutions with translations in the six official languages of the United Nations, with an average of around 3 million tokens per language. The corpus is available in a preprocessed, formatting-normalized TMX format with paragraphs aligned across multiple languages. We describe the background to the corpus and its content, the process of its construction, and some of its interesting properties.
| Original language | English |
|---|---|
| Title of host publication | MT Summit XII proceedings |
| Publisher | International Association of Machine Translation |
| Number of pages | 8 |
| Publication status | Published - 2009 |
| Event | Machine Translation Summit (12th : 2009) - Ottawa, Canada Duration: 26 Aug 2009 → 30 Aug 2009 |
Conference
| Conference | Machine Translation Summit (12th : 2009) |
|---|---|
| City | Ottawa, Canada |
| Period | 26/08/09 → 30/08/09 |