Abstract
This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as "topic models" to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combineinsights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.
Original language | English |
---|---|
Title of host publication | ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference |
Place of Publication | Stroudsburg, PA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1148-1157 |
Number of pages | 10 |
ISBN (Print) | 9781617388088 |
Publication status | Published - 2010 |
Event | 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden Duration: 11 Jul 2010 → 16 Jul 2010 |
Other
Other | 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 |
---|---|
Country/Territory | Sweden |
City | Uppsala |
Period | 11/07/10 → 16/07/10 |