Abstract
Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 14th IEEE International Conference on Data Mining, ICDM 2014 |
| Editors | Ravi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu |
| Place of Publication | Piscataway, NJ |
| Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
| Pages | 803-808 |
| Number of pages | 6 |
| DOIs | |
| Publication status | Published - 2014 |
| Event | IEEE International Conference on Data Mining (14th : 2014) - Shenzhen, China Duration: 14 Dec 2014 → 17 Dec 2014 |
Publication series
| Name | Proceedings - IEEE International Conference on Data Mining, ICDM |
|---|---|
| Number | January |
| Volume | 2015-January |
| ISSN (Print) | 1550-4786 |
Conference
| Conference | IEEE International Conference on Data Mining (14th : 2014) |
|---|---|
| Country/Territory | China |
| City | Shenzhen |
| Period | 14/12/14 → 17/12/14 |
Keywords
- GMM
- permutation
- top-t ordering
- Topic model
- topic segmentation
Fingerprint
Dive into the research topics of 'Topic models with topic ordering regularities for topic segmentation'. Together they form a unique fingerprint.Projects
- 2 Finished
-
Computational models of synergies in human language acquisition
Johnson, M. (Primary Chief Investigator), Frank, M. (Partner Investigator), Newton, J. (Other), MQRES, M. (Student) & Demuth, K. (Chief Investigator)
31/07/11 → 30/06/16
Project: Research
-
Incremental syntactic parsing and coreference resolution
Johnson, M. (Primary Chief Investigator), Steedman, M. (Partner Investigator), Newton, J. (Other), MQRES, M. (Other) & PhD Contribution (ARC), P. C. (Other)
31/07/11 → 31/12/15
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver