Text summarization using unsupervised deep learning

Research output: Contribution to journalArticleResearchpeer-review

Abstract

We present methods of extractive query-oriented single-document summarization using a deep auto-encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global vocabularies. We investigate the effect of adding small random noise to local tf as the input representation of AE, and propose an ensemble of such noisy AEs which we call the Ensemble Noisy Auto-Encoder (ENAE). ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs. In each individual experiment of the ensemble, a different randomly generated noise is added to the input representation. This architecture changes the application of the AE from a deterministic feed-forward network to a stochastic runtime model. Experiments show that the AE using local vocabularies clearly provide a more discriminative feature space and improves the recall on average 11.2%. The ENAE can make further improvements, particularly in selecting informative sentences. To cover a wide range of topics and structures, we perform experiments on two different publicly available email corpora that are specifically designed for text summarization. We used ROUGE as a fully automatic metric in text summarization and we presented the average ROUGE-2 recall for all experiments.
LanguageEnglish
Pages93-105
Number of pages13
JournalExpert Systems With Applications
Volume68
DOIs
Publication statusPublished - 1 Feb 2017

Fingerprint

Experiments
Electronic mail
Stochastic models
Deep learning

Keywords

  • deep learning
  • query-oriented summarization
  • extractive summarization
  • ensemble noisy auto-encoder

Cite this

@article{7d062c67e0034ac2a939bcdccafd41fa,
title = "Text summarization using unsupervised deep learning",
abstract = "We present methods of extractive query-oriented single-document summarization using a deep auto-encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global vocabularies. We investigate the effect of adding small random noise to local tf as the input representation of AE, and propose an ensemble of such noisy AEs which we call the Ensemble Noisy Auto-Encoder (ENAE). ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs. In each individual experiment of the ensemble, a different randomly generated noise is added to the input representation. This architecture changes the application of the AE from a deterministic feed-forward network to a stochastic runtime model. Experiments show that the AE using local vocabularies clearly provide a more discriminative feature space and improves the recall on average 11.2{\%}. The ENAE can make further improvements, particularly in selecting informative sentences. To cover a wide range of topics and structures, we perform experiments on two different publicly available email corpora that are specifically designed for text summarization. We used ROUGE as a fully automatic metric in text summarization and we presented the average ROUGE-2 recall for all experiments.",
keywords = "deep learning, query-oriented summarization, extractive summarization, ensemble noisy auto-encoder",
author = "Mahmood Yousefi-Azar and Len Hamey",
year = "2017",
month = "2",
day = "1",
doi = "10.1016/j.eswa.2016.10.017",
language = "English",
volume = "68",
pages = "93--105",
journal = "Expert Systems With Applications",
issn = "0957-4174",
publisher = "Pergamon",

}

Text summarization using unsupervised deep learning. / Yousefi-Azar, Mahmood; Hamey, Len.

In: Expert Systems With Applications, Vol. 68, 01.02.2017, p. 93-105.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Text summarization using unsupervised deep learning

AU - Yousefi-Azar, Mahmood

AU - Hamey, Len

PY - 2017/2/1

Y1 - 2017/2/1

N2 - We present methods of extractive query-oriented single-document summarization using a deep auto-encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global vocabularies. We investigate the effect of adding small random noise to local tf as the input representation of AE, and propose an ensemble of such noisy AEs which we call the Ensemble Noisy Auto-Encoder (ENAE). ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs. In each individual experiment of the ensemble, a different randomly generated noise is added to the input representation. This architecture changes the application of the AE from a deterministic feed-forward network to a stochastic runtime model. Experiments show that the AE using local vocabularies clearly provide a more discriminative feature space and improves the recall on average 11.2%. The ENAE can make further improvements, particularly in selecting informative sentences. To cover a wide range of topics and structures, we perform experiments on two different publicly available email corpora that are specifically designed for text summarization. We used ROUGE as a fully automatic metric in text summarization and we presented the average ROUGE-2 recall for all experiments.

AB - We present methods of extractive query-oriented single-document summarization using a deep auto-encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global vocabularies. We investigate the effect of adding small random noise to local tf as the input representation of AE, and propose an ensemble of such noisy AEs which we call the Ensemble Noisy Auto-Encoder (ENAE). ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs. In each individual experiment of the ensemble, a different randomly generated noise is added to the input representation. This architecture changes the application of the AE from a deterministic feed-forward network to a stochastic runtime model. Experiments show that the AE using local vocabularies clearly provide a more discriminative feature space and improves the recall on average 11.2%. The ENAE can make further improvements, particularly in selecting informative sentences. To cover a wide range of topics and structures, we perform experiments on two different publicly available email corpora that are specifically designed for text summarization. We used ROUGE as a fully automatic metric in text summarization and we presented the average ROUGE-2 recall for all experiments.

KW - deep learning

KW - query-oriented summarization

KW - extractive summarization

KW - ensemble noisy auto-encoder

UR - http://www.scopus.com/inward/record.url?scp=84992153975&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2016.10.017

DO - 10.1016/j.eswa.2016.10.017

M3 - Article

VL - 68

SP - 93

EP - 105

JO - Expert Systems With Applications

T2 - Expert Systems With Applications

JF - Expert Systems With Applications

SN - 0957-4174

ER -