A computationally efficient algorithm for learning topical collocation models

Zhendong Zhao, Lan Du, Benjamin Börschinger, John K. Pate, Massimiliano Ciaramita, Mark Steedman, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

12 Downloads (Pure)

Abstract

Most existing topic models make the bagof-words assumption that words are generated independently, and so ignore potentially useful information about word order. Previous attempts to use collocations (short sequences of adjacent words) in topic models have either relied on a pipeline approach, restricted attention to bigrams, or resulted in models whose inference does not scale to large corpora. This paper studies how to simultaneously learn both collocations and their topic assignments. We present an efficient reformulation of the Adaptor Grammar-based topical collocation model (AG-colloc) (Johnson, 2010), and develop a point-wise sampling algorithm for posterior inference in this new formulation. We further improve the efficiency of the sampling algorithm by exploiting sparsity and parallelising inference. Experimental results derived in text classification, information retrieval and human evaluation tasks across a range of datasets show that this reformulation scales to hundreds of thousands of documents while maintaining the good performance of the AG-colloc model.

Original languageEnglish
Title of host publicationProceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
Place of PublicationRed Hook, N.Y.
PublisherAssociation for Computational Linguistics (ACL)
Pages1460-1469
Number of pages10
ISBN (Electronic)9781941643723
DOIs
Publication statusPublished - Jul 2015
Event53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP - 2015 - Beijing, China
Duration: 26 Jul 201531 Jul 2015

Other

Other53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP - 2015
Country/TerritoryChina
CityBeijing
Period26/07/1531/07/15

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Fingerprint

Dive into the research topics of 'A computationally efficient algorithm for learning topical collocation models'. Together they form a unique fingerprint.

Cite this