Abstract
We present a new approach to inducing the syntactic categories of words, combining their distributional and morphological properties in a joint nonparametric Bayesian model based on the distance-dependent Chinese Restaurant Process. The prior distribution over word clusterings uses a log-linear model of morphological similarity; the likelihood function is the probability of generating vector word embeddings. The weights of the morphology model are learned jointly while inducing part-of-speech clusters, encouraging them to cohere with the distributional features. The resulting algorithm outperforms competitive alternatives on English POS induction.
| Original language | English |
|---|---|
| Title of host publication | 52nd Annual Meeting of the Association for Computational Linguistics |
| Subtitle of host publication | proceedings of the conference : volume 2 : short papers |
| Place of Publication | Stroudsburg, PA |
| Publisher | Association for Computational Linguistics |
| Pages | 265-271 |
| Number of pages | 7 |
| Volume | 2 |
| ISBN (Print) | 9781937284732 |
| DOIs | |
| Publication status | Published - 2014 |
| Externally published | Yes |
| Event | Annual meeting of the Association for Computational Linguistics (52nd : 2014) - Baltimore, USA Duration: 22 Jun 2014 → 27 Jun 2014 |
Conference
| Conference | Annual meeting of the Association for Computational Linguistics (52nd : 2014) |
|---|---|
| City | Baltimore, USA |
| Period | 22/06/14 → 27/06/14 |
Bibliographical note
Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Fingerprint
Dive into the research topics of 'POS induction with distributional and morphological information using a distance-dependent Chinese Restaurant Process'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver