United Nations General Assembly Resolutions

a six-language parallel corpus

Alexandre Rafalovitch, Robert Dale

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

In this paper we describe a six-ways parallel public-domain corpus consisting of 2100 United Nations General Assembly Resolutions with translations in the six official languages of the United Nations, with an average of around 3 million tokens per language. The corpus is available in a preprocessed, formatting-normalized TMX format with paragraphs aligned across multiple languages. We describe the background to the corpus and its content, the process of its construction, and some of its interesting properties.
Original languageEnglish
Title of host publicationMT Summit XII proceedings
PublisherInternational Association of Machine Translation
Number of pages8
Publication statusPublished - 2009
EventMachine Translation Summit (12th : 2009) - Ottawa, Canada
Duration: 26 Aug 200930 Aug 2009

Conference

ConferenceMachine Translation Summit (12th : 2009)
CityOttawa, Canada
Period26/08/0930/08/09

Cite this

Rafalovitch, A., & Dale, R. (2009). United Nations General Assembly Resolutions: a six-language parallel corpus. In MT Summit XII proceedings International Association of Machine Translation.