Predicting word choice in affective text

M. Gardiner, M. Dras

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Choosing the best word or phrase for a given context from among the candidate near-synonyms, such as slim and skinny, is a difficult language generation problem. In this paper, we describe approaches to solving an instance of this problem, the lexical gap problem, with a particular focus on affect and subjectivity; to do this we draw upon techniques from the sentiment and subjectivity analysis fields. We present a supervised approach to this problem, initially with a unigram model that solidly outperforms the baseline, with a 6.8% increase in accuracy. The results to some extent confirm those from related problems, where feature presence outperforms feature frequency, and immediate context features generally outperform wider context features. However, this latter is somewhat surprisingly not always the case, and not necessarily where intuition might first suggest; and an analysis of where document-level models are in some cases better suggested that, in our corpus, broader features related to the 'tone' of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself. From these, our best model has a 10.1% increase in accuracy, corresponding to a 38% reduction in errors. Moreover, our models do not just improve accuracy on affective word choice, but on non-affective word choice also.

LanguageEnglish
Pages97-134
Number of pages38
JournalNatural Language Engineering
Volume22
Issue number1
DOIs
Publication statusPublished - 1 Jan 2016

Fingerprint

subjectivity
intuition
weighting
candidacy
Affective
Word Choice
language
Subjectivity
Sentiment
Language Generation
Synonyms
Lexical Gaps
Intuition

Cite this

Gardiner, M. ; Dras, M. / Predicting word choice in affective text. In: Natural Language Engineering. 2016 ; Vol. 22, No. 1. pp. 97-134.
@article{fe4f3a6416de485d99be583fb0fffd03,
title = "Predicting word choice in affective text",
abstract = "Choosing the best word or phrase for a given context from among the candidate near-synonyms, such as slim and skinny, is a difficult language generation problem. In this paper, we describe approaches to solving an instance of this problem, the lexical gap problem, with a particular focus on affect and subjectivity; to do this we draw upon techniques from the sentiment and subjectivity analysis fields. We present a supervised approach to this problem, initially with a unigram model that solidly outperforms the baseline, with a 6.8{\%} increase in accuracy. The results to some extent confirm those from related problems, where feature presence outperforms feature frequency, and immediate context features generally outperform wider context features. However, this latter is somewhat surprisingly not always the case, and not necessarily where intuition might first suggest; and an analysis of where document-level models are in some cases better suggested that, in our corpus, broader features related to the 'tone' of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself. From these, our best model has a 10.1{\%} increase in accuracy, corresponding to a 38{\%} reduction in errors. Moreover, our models do not just improve accuracy on affective word choice, but on non-affective word choice also.",
author = "M. Gardiner and M. Dras",
year = "2016",
month = "1",
day = "1",
doi = "10.1017/S1351324915000157",
language = "English",
volume = "22",
pages = "97--134",
journal = "Natural Language Engineering",
issn = "1351-3249",
publisher = "Cambridge University Press",
number = "1",

}

Predicting word choice in affective text. / Gardiner, M.; Dras, M.

In: Natural Language Engineering, Vol. 22, No. 1, 01.01.2016, p. 97-134.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Predicting word choice in affective text

AU - Gardiner, M.

AU - Dras, M.

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Choosing the best word or phrase for a given context from among the candidate near-synonyms, such as slim and skinny, is a difficult language generation problem. In this paper, we describe approaches to solving an instance of this problem, the lexical gap problem, with a particular focus on affect and subjectivity; to do this we draw upon techniques from the sentiment and subjectivity analysis fields. We present a supervised approach to this problem, initially with a unigram model that solidly outperforms the baseline, with a 6.8% increase in accuracy. The results to some extent confirm those from related problems, where feature presence outperforms feature frequency, and immediate context features generally outperform wider context features. However, this latter is somewhat surprisingly not always the case, and not necessarily where intuition might first suggest; and an analysis of where document-level models are in some cases better suggested that, in our corpus, broader features related to the 'tone' of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself. From these, our best model has a 10.1% increase in accuracy, corresponding to a 38% reduction in errors. Moreover, our models do not just improve accuracy on affective word choice, but on non-affective word choice also.

AB - Choosing the best word or phrase for a given context from among the candidate near-synonyms, such as slim and skinny, is a difficult language generation problem. In this paper, we describe approaches to solving an instance of this problem, the lexical gap problem, with a particular focus on affect and subjectivity; to do this we draw upon techniques from the sentiment and subjectivity analysis fields. We present a supervised approach to this problem, initially with a unigram model that solidly outperforms the baseline, with a 6.8% increase in accuracy. The results to some extent confirm those from related problems, where feature presence outperforms feature frequency, and immediate context features generally outperform wider context features. However, this latter is somewhat surprisingly not always the case, and not necessarily where intuition might first suggest; and an analysis of where document-level models are in some cases better suggested that, in our corpus, broader features related to the 'tone' of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself. From these, our best model has a 10.1% increase in accuracy, corresponding to a 38% reduction in errors. Moreover, our models do not just improve accuracy on affective word choice, but on non-affective word choice also.

UR - http://www.scopus.com/inward/record.url?scp=84949625537&partnerID=8YFLogxK

UR - http://purl.org/au-research/grants/arc/DP0558852

U2 - 10.1017/S1351324915000157

DO - 10.1017/S1351324915000157

M3 - Article

VL - 22

SP - 97

EP - 134

JO - Natural Language Engineering

T2 - Natural Language Engineering

JF - Natural Language Engineering

SN - 1351-3249

IS - 1

ER -