Abstract
In this paper we describe a six-ways parallel public-domain corpus consisting of 2100 United Nations General Assembly Resolutions with translations in the six official languages of the United Nations, with an average of around 3 million tokens per language. The corpus is available in a preprocessed, formatting-normalized TMX format with paragraphs aligned across multiple languages. We describe the background to the corpus and its content, the process of its construction, and some of its interesting properties.
Original language | English |
---|---|
Title of host publication | MT Summit XII proceedings |
Publisher | International Association of Machine Translation |
Number of pages | 8 |
Publication status | Published - 2009 |
Event | Machine Translation Summit (12th : 2009) - Ottawa, Canada Duration: 26 Aug 2009 → 30 Aug 2009 |
Conference
Conference | Machine Translation Summit (12th : 2009) |
---|---|
City | Ottawa, Canada |
Period | 26/08/09 → 30/08/09 |