Abstract
Supervised approaches for text summarisation suffer from the problem of mismatch between the target labels/scores of individual sentences and the evaluation score of the final summary. Reinforcement learning can solve this problem by providing a learning mechanism that uses the score of the final summary as a guide to determine the decisions made at the time of selection of each sentence. In this paper we present a proof-of-concept approach that applies a policy-gradient algorithm to learn a stochastic policy using an undiscounted reward. The method has been applied to a policy consisting of a simple neural network and simple features. The resulting deep reinforcement learning system is able to learn a global policy and obtain encouraging results.
| Original language | English |
|---|---|
| Title of host publication | Australasian Language Technology Association Workshop 2017 |
| Subtitle of host publication | Proceedings of the Workshop |
| Editors | Jojo Sze-Meng Wong, Gholamreza Haffari |
| Place of Publication | Stroudsburg, PA |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 103-107 |
| Number of pages | 5 |
| Publication status | Published - 2017 |
| Event | Australasian Language Technology Association Workshop 2017 - Brisbane, Australia Duration: 6 Dec 2017 → 8 Dec 2017 |
Conference
| Conference | Australasian Language Technology Association Workshop 2017 |
|---|---|
| Country/Territory | Australia |
| City | Brisbane |
| Period | 6/12/17 → 8/12/17 |
Fingerprint
Dive into the research topics of 'Towards the use of deep reinforcement learning with global policy For query-based extractive summarisation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver