Abstract
In this paper we use a deep auto-encoder for extractive query-based summarization. We experiment with different input representations in order to overcome the problems stemming from sparse inputs characteristic to linguistic data. In particular, we propose constructing a local vocabulary for each document and adding a small random noise to the input. Also, we propose using inputs with added noise in an Ensemble Noisy Auto-Encoder (ENAE) that combines the top ranked sentences from multiple runs on the same input with different added noise. We test our model on a publicly available email dataset that is specifi- cally designed for text summarization. We show that although an auto-encoder can be a quite effective summarizer, adding noise to the input and running a noisy ensemble can make improvements
Original language | English |
---|---|
Pages (from-to) | 2-10 |
Number of pages | 9 |
Journal | ALTA 2015 : Proceedings of Australasian Language Technology Association Workshop 2015 |
Publication status | Published - 2015 |
Event | Australasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW Duration: 8 Dec 2015 → 9 Dec 2015 |