Text summarization using unsupervised deep learning

Mahmood Yousefi-Azar*, Len Hamey

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

97 Citations (Scopus)

Abstract

We present methods of extractive query-oriented single-document summarization using a deep auto-encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global vocabularies. We investigate the effect of adding small random noise to local tf as the input representation of AE, and propose an ensemble of such noisy AEs which we call the Ensemble Noisy Auto-Encoder (ENAE). ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs. In each individual experiment of the ensemble, a different randomly generated noise is added to the input representation. This architecture changes the application of the AE from a deterministic feed-forward network to a stochastic runtime model. Experiments show that the AE using local vocabularies clearly provide a more discriminative feature space and improves the recall on average 11.2%. The ENAE can make further improvements, particularly in selecting informative sentences. To cover a wide range of topics and structures, we perform experiments on two different publicly available email corpora that are specifically designed for text summarization. We used ROUGE as a fully automatic metric in text summarization and we presented the average ROUGE-2 recall for all experiments.
Original languageEnglish
Pages (from-to)93-105
Number of pages13
JournalExpert Systems With Applications
Volume68
DOIs
Publication statusPublished - 1 Feb 2017

Keywords

  • deep learning
  • query-oriented summarization
  • extractive summarization
  • ensemble noisy auto-encoder

Fingerprint Dive into the research topics of 'Text summarization using unsupervised deep learning'. Together they form a unique fingerprint.

Cite this