Topic models with topic ordering regularities for topic segmentation

Lan Du, John K. Pate, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.

LanguageEnglish
Title of host publicationProceedings of 2014 IEEE international conference on data mining
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages803-808
Number of pages6
DOIs
Publication statusPublished - 2014
EventIEEE International Conference on Data Mining (14th : 2014) - Shenzhen, China
Duration: 14 Dec 201417 Dec 2014

Conference

ConferenceIEEE International Conference on Data Mining (14th : 2014)
CountryChina
CityShenzhen
Period14/12/1417/12/14

Fingerprint

Sampling

Cite this

Du, L., Pate, J. K., & Johnson, M. (2014). Topic models with topic ordering regularities for topic segmentation. In Proceedings of 2014 IEEE international conference on data mining (pp. 803-808). Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE). Proceedings - IEEE International Conference on Data Mining, ICDM https://doi.org/10.1109/ICDM.2014.49
Du, Lan ; Pate, John K. ; Johnson, Mark. / Topic models with topic ordering regularities for topic segmentation. Proceedings of 2014 IEEE international conference on data mining. Piscataway, NJ : Institute of Electrical and Electronics Engineers (IEEE), 2014. pp. 803-808 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{12d3c974128840e9a326f33786e0cf47,
title = "Topic models with topic ordering regularities for topic segmentation",
abstract = "Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.",
author = "Lan Du and Pate, {John K.} and Mark Johnson",
year = "2014",
doi = "10.1109/ICDM.2014.49",
language = "English",
pages = "803--808",
booktitle = "Proceedings of 2014 IEEE international conference on data mining",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",
address = "United States",

}

Du, L, Pate, JK & Johnson, M 2014, Topic models with topic ordering regularities for topic segmentation. in Proceedings of 2014 IEEE international conference on data mining. Institute of Electrical and Electronics Engineers (IEEE), Piscataway, NJ, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 803-808, IEEE International Conference on Data Mining (14th : 2014), Shenzhen, China, 14/12/14. https://doi.org/10.1109/ICDM.2014.49

Topic models with topic ordering regularities for topic segmentation. / Du, Lan; Pate, John K.; Johnson, Mark.

Proceedings of 2014 IEEE international conference on data mining. Piscataway, NJ : Institute of Electrical and Electronics Engineers (IEEE), 2014. p. 803-808.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - Topic models with topic ordering regularities for topic segmentation

AU - Du, Lan

AU - Pate, John K.

AU - Johnson, Mark

PY - 2014

Y1 - 2014

N2 - Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.

AB - Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.

UR - http://www.scopus.com/inward/record.url?scp=84936941389&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2014.49

DO - 10.1109/ICDM.2014.49

M3 - Conference proceeding contribution

SP - 803

EP - 808

BT - Proceedings of 2014 IEEE international conference on data mining

PB - Institute of Electrical and Electronics Engineers (IEEE)

CY - Piscataway, NJ

ER -

Du L, Pate JK, Johnson M. Topic models with topic ordering regularities for topic segmentation. In Proceedings of 2014 IEEE international conference on data mining. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE). 2014. p. 803-808. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2014.49