The exponential growth of the Web documents has constituted the need for automatic document summarization. In this context, extractive document summarization, i.e., that task of extracting the most relevant information, removing redundancy and presenting the remained data in a coherent and cohesive structure, is a challenging task. In this article, we propose a novel intelligent approach, namely ExDoS, that harvests benefits of both supervised and unsupervised algorithms simultaneously. To the best of our knowledge, ExDoS is the first approach to combine both supervised and unsupervised algorithms in a single framework and an interpretable manner for document summarization purpose. ExDoS iteratively minimizes the error rate of the classifier in each cluster with the help of dynamic local feature weighting. Moreover, this approach specifies the contribution of features to discriminate each class, which is a challenging issue in the summarization task. Therefore, in addition to summarizing text, ExDoS is also able to measure the importance of each feature in the summarization process. We evaluate our model both automatically (in terms of ROUGE factor) and empirically (human analysis) on the benchmark datasets: the DUC2002 and CNN/DailyMail. Results show that our model obtains higher ROUGE scores comparing to most state-of-the-art models. The human evaluation also demonstrates that our model is capable of generating informative and readable summaries.
Bibliographical noteVersion archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
- Automatic text summarization
- extractive summarization
- feature weighting
- multi-document summarization