Using dependency-based features to take the "para-farce" out of paraphrase

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review


As research in text-to-text paraphrase generation progresses, it has the potential to improve the quality of generated text. However, the use of paraphrase generation methods creates a secondary problem. We must ensure that generated novel sentences are not inconsistent with the text from which it was generated. We propose a machine learning approach be used to filter out inconsistent novel sentences, or False Paraphrases. To train such a filter, we use the Microsoft Research Paraphrase corpus and investigate whether features based on syntactic dependencies can aid us in this task. Like Finch et al. (2005), we obtain a classification accuracy of 75.6%, the best known performance for this corpus. We also examine the strengths and weaknesses of dependency based features and conclude that they may be useful in more accurately classifying cases of False Paraphrase.
Original languageEnglish
Title of host publicationProceedings of the 2006 Australasian language technology workshop 2006, November 30-December 1, 2006, Sancta Sophia College, Sydney
EditorsLawrence Cavedon, Ingrid Zukerman
Place of PublicationCarlton, Vic
PublisherAustralian Language Technology Association
Number of pages8
ISBN (Print)1741081467
Publication statusPublished - 2006
EventAustralasian Language Technology Association Workshop - Sydney
Duration: 30 Nov 20061 Dec 2006


WorkshopAustralasian Language Technology Association Workshop


Dive into the research topics of 'Using dependency-based features to take the "para-farce" out of paraphrase'. Together they form a unique fingerprint.

Cite this