Nonparametric Bayesian topic modelling with the hierarchical Pitman–Yor processes

Kar Wai Lim*, Wray Buntine, Changyou Chen, Lan Du

*Corresponding author for this work

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

The Dirichlet process and its extension, the Pitman–Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications.

Original languageEnglish
Pages (from-to)172-191
Number of pages20
JournalInternational Journal of Approximate Reasoning
Volume78
DOIs
Publication statusPublished - 1 Nov 2016
Externally publishedYes

Keywords

  • Bayesian nonparametric methods
  • Markov chain Monte Carlo
  • topic models
  • Hierarchical Pitman–Yor processes

Fingerprint Dive into the research topics of 'Nonparametric Bayesian topic modelling with the hierarchical Pitman–Yor processes'. Together they form a unique fingerprint.

Cite this