Abstract
As research in text-to-text paraphrase generation progresses, it has the potential
to improve the quality of generated text. However, the use of paraphrase generation
methods creates a secondary problem. We must ensure that generated novel sentences
are not inconsistent with the text from which it was generated. We propose a machine learning approach be used to filter out inconsistent novel sentences, or False Paraphrases. To train such a filter, we use the Microsoft Research Paraphrase corpus and investigate whether features based on syntactic dependencies can aid us in this task. Like Finch et al. (2005), we obtain a classification accuracy of 75.6%, the best known performance for this corpus. We also examine the strengths and weaknesses of dependency based features and conclude that they may be useful in more accurately classifying cases of False Paraphrase.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2006 Australasian language technology workshop 2006, November 30-December 1, 2006, Sancta Sophia College, Sydney |
Editors | Lawrence Cavedon, Ingrid Zukerman |
Place of Publication | Carlton, Vic |
Publisher | Australian Language Technology Association |
Pages | 131-138 |
Number of pages | 8 |
ISBN (Print) | 1741081467 |
Publication status | Published - 2006 |
Event | Australasian Language Technology Association Workshop - Sydney Duration: 30 Nov 2006 → 1 Dec 2006 |
Workshop
Workshop | Australasian Language Technology Association Workshop |
---|---|
City | Sydney |
Period | 30/11/06 → 1/12/06 |