Abstract
Supervised approaches for text summarisation suffer from the problem of mismatch between the target labels/scores of individual sentences and the evaluation score of the final summary. Reinforcement learning can solve this problem by providing a learning mechanism that uses the score of the final summary as a guide to determine the decisions made at the time of selection of each sentence. In this paper we present a proof-of-concept approach that applies a policy-gradient algorithm to learn a stochastic policy using an undiscounted reward. The method has been applied to a policy consisting of a simple neural network and simple features. The resulting deep reinforcement learning system is able to learn a global policy and obtain encouraging results.
Original language | English |
---|---|
Title of host publication | Australasian Language Technology Association Workshop 2017 |
Subtitle of host publication | Proceedings of the Workshop |
Editors | Jojo Sze-Meng Wong, Gholamreza Haffari |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 103-107 |
Number of pages | 5 |
Publication status | Published - 2017 |
Event | Australasian Language Technology Association Workshop 2017 - Brisbane, Australia Duration: 6 Dec 2017 → 8 Dec 2017 |
Conference
Conference | Australasian Language Technology Association Workshop 2017 |
---|---|
Country/Territory | Australia |
City | Brisbane |
Period | 6/12/17 → 8/12/17 |