TY - JOUR
T1 - Evaluating human pairwise preference judgments
AU - Dras, Mark
N1 - Copyright the Publisher 2015. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2015/6/19
Y1 - 2015/6/19
N2 - Human evaluation plays an important role in NLP, often in the form of preference judgments. Although there has been some use of classical non-parametric and bespoke approaches to evaluating these sorts of judgments, there is an entire body of work on this in the context of sensory discrimination testing and the human judgments that are central to it, backed by rigorous statistical theory and freely available software, that NLP can draw on. We investigate one approach, Log-Linear Bradley-Terry models, and apply it to sample NLP data.
AB - Human evaluation plays an important role in NLP, often in the form of preference judgments. Although there has been some use of classical non-parametric and bespoke approaches to evaluating these sorts of judgments, there is an entire body of work on this in the context of sensory discrimination testing and the human judgments that are central to it, backed by rigorous statistical theory and freely available software, that NLP can draw on. We investigate one approach, Log-Linear Bradley-Terry models, and apply it to sample NLP data.
UR - http://www.scopus.com/inward/record.url?scp=84931046473&partnerID=8YFLogxK
U2 - 10.1162/COLI_a_00222
DO - 10.1162/COLI_a_00222
M3 - Article
AN - SCOPUS:84931046473
SN - 0891-2017
VL - 41
SP - 337
EP - 345
JO - Computational Linguistics
JF - Computational Linguistics
IS - 2
ER -