Corpus statistics approaches to discriminating among near-synonyms

Mary Gardiner, Mark Dras

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

Near-synonyms are words that mean approximately the same thing, and which tend to be assigned to the same leaf in ontologies such as WordNet. However, they can differ from each other subtly in both meaning and usage—consider the pair of near-synonyms frugal and stingy—and therefore choosing the appropriate near-synonym for a given context is not a trivial problem. Early work on near-synonyms was that of Edmonds (1997). Edmonds reported an experiment attempting to predict which of a set of near-synonyms would be used in a given context using lexical co-occurrence networks. The conclusion of this work was that corpus statistics approaches did not appear to work well for this type of problem and led instead to the development of machine learning approaches over lexical resources such as Choose the Right Word (Hayakawa, 1994). Our hypothesis is that some kind of corpus statistics approach may still be effective in some situations: particularly if the nearsynonyms differ in sentiment from each other. Intuition based on work in sentiment analysis suggests that if the distribution of words embodying some characteristic of sentiment can predict the overall sentiment or attitude of a document, perhaps these same words can predict the choice of an individual ‘attitudinal’ nearsynonym given its context, while this is not necessarily true for other types of nearsynonym. This would again open up problems involving this type of near-synonym to corpus statistics methods. As a first step, then, we investigate whether attitudinal near-synonyms are more likely to be successfully predicted by a corpus statistics method than other types. In this paper we present a larger-scale experiment based on Edmonds (1997), and show that attitudinal near-synonyms can in fact be predicted more accurately using corpus statistics methods.
Original languageEnglish
Title of host publicationPACLING '07
Subtitle of host publicationproceedings of the conference Pacific Association for Computational Linguistics ; 19-21 September 2007 University of Melbourne, Melbourne, Australia
Place of PublicationMelbourne
PublisherPacific Association for Computational Linguistics
Pages31-39
Number of pages9
Publication statusPublished - 2007
EventConference of the Pacific Association for Computational Linguistics (10th : 2007) - Melbourne
Duration: 19 Sep 200721 Sep 2007

Conference

ConferenceConference of the Pacific Association for Computational Linguistics (10th : 2007)
CityMelbourne
Period19/09/0721/09/07

    Fingerprint

Cite this

Gardiner, M., & Dras, M. (2007). Corpus statistics approaches to discriminating among near-synonyms. In PACLING '07: proceedings of the conference Pacific Association for Computational Linguistics ; 19-21 September 2007 University of Melbourne, Melbourne, Australia (pp. 31-39). Melbourne: Pacific Association for Computational Linguistics.