Abstract
In evaluating the output of language technology applications-MT, natural language generation, summarisation-automatic evaluation techniques generally conflate measurement of faithfulness to source content with fluency of the resulting text. In this paper we develop an automatic evaluation metric to estimate fluency alone, by examining the use of parser outputs as metrics, and show that they correlate with human judgements of generated text fluency. We then develop a machine learner based on these, and show that this performs better than the individual parser metrics, approaching a lower bound on human performance. We finally look at different language models for generating sentences, and show that while individual parser metrics can be 'fooled' depending on generation method, the machine learner provides a consistent estimator of fluency.
Original language | English |
---|---|
Title of host publication | ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 344-351 |
Number of pages | 8 |
ISBN (Print) | 9781932432862 |
Publication status | Published - 2007 |
Event | 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic Duration: 23 Jun 2007 → 30 Jun 2007 |
Other
Other | 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 23/06/07 → 30/06/07 |