Near-synonyms are words that mean approximately the same thing, and which tend to be assigned to the same leaf in ontologies such as WordNet. However, they can differ from each other subtly in both meaning and usage—consider the pair of near-synonyms frugal and stingy—and therefore choosing the appropriate near-synonym for a given context is not a trivial problem. Early work on near-synonyms was that of Edmonds (1997). Edmonds reported an experiment attempting to predict which of a set of near-synonyms would be used in a given context using lexical co-occurrence networks. The conclusion of this work was that corpus statistics approaches did not appear to work well for this type of problem and led instead to the development of machine learning approaches over lexical resources such as Choose the Right Word (Hayakawa, 1994). Our hypothesis is that some kind of corpus statistics approach may still be effective in some situations: particularly if the nearsynonyms differ in sentiment from each other. Intuition based on work in sentiment analysis suggests that if the distribution of words embodying some characteristic of sentiment can predict the overall sentiment or attitude of a document, perhaps these same words can predict the choice of an individual ‘attitudinal’ nearsynonym given its context, while this is not necessarily true for other types of nearsynonym. This would again open up problems involving this type of near-synonym to corpus statistics methods. As a first step, then, we investigate whether attitudinal near-synonyms are more likely to be successfully predicted by a corpus statistics method than other types. In this paper we present a larger-scale experiment based on Edmonds (1997), and show that attitudinal near-synonyms can in fact be predicted more accurately using corpus statistics methods.
|Title of host publication||PACLING '07|
|Subtitle of host publication||proceedings of the conference Pacific Association for Computational Linguistics ; 19-21 September 2007 University of Melbourne, Melbourne, Australia|
|Place of Publication||Melbourne|
|Publisher||Pacific Association for Computational Linguistics|
|Number of pages||9|
|Publication status||Published - 2007|
|Event||Conference of the Pacific Association for Computational Linguistics (10th : 2007) - Melbourne|
Duration: 19 Sep 2007 → 21 Sep 2007
|Conference||Conference of the Pacific Association for Computational Linguistics (10th : 2007)|
|Period||19/09/07 → 21/09/07|
Gardiner, M., & Dras, M. (2007). Corpus statistics approaches to discriminating among near-synonyms. In PACLING '07: proceedings of the conference Pacific Association for Computational Linguistics ; 19-21 September 2007 University of Melbourne, Melbourne, Australia (pp. 31-39). Melbourne: Pacific Association for Computational Linguistics.