Abstract
We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a point-wise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s). Experimental results show that our model outperforms previous unsupervised segmentation methods using only lexical information on Choi's datasets and two meeting transcripts and has performance comparable to those previous methods on two written datasets.
Original language | English |
---|---|
Title of host publication | NAACL HLT 2013 |
Subtitle of host publication | 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 190-200 |
Number of pages | 11 |
ISBN (Electronic) | 9781937284473 |
Publication status | Published - 2013 |
Event | 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 - Atlanta, United States Duration: 9 Jun 2013 → 14 Jun 2013 |
Other
Other | 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 9/06/13 → 14/06/13 |