POS induction with distributional and morphological information using a distance-dependent Chinese Restaurant Process

Kairit Sirts, Jacob Eisenstein, Micha Elsner, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

7 Citations (Scopus)
3 Downloads (Pure)

Abstract

We present a new approach to inducing the syntactic categories of words, combining their distributional and morphological properties in a joint nonparametric Bayesian model based on the distance-dependent Chinese Restaurant Process. The prior distribution over word clusterings uses a log-linear model of morphological similarity; the likelihood function is the probability of generating vector word embeddings. The weights of the morphology model are learned jointly while inducing part-of-speech clusters, encouraging them to cohere with the distributional features. The resulting algorithm outperforms competitive alternatives on English POS induction.
Original languageEnglish
Title of host publication52nd Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationproceedings of the conference : volume 2 : short papers
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics
Pages265-271
Number of pages7
Volume2
ISBN (Print)9781937284732
DOIs
Publication statusPublished - 2014
Externally publishedYes
EventAnnual meeting of the Association for Computational Linguistics (52nd : 2014) - Baltimore, USA
Duration: 22 Jun 201427 Jun 2014

Conference

ConferenceAnnual meeting of the Association for Computational Linguistics (52nd : 2014)
CityBaltimore, USA
Period22/06/1427/06/14

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Fingerprint

Dive into the research topics of 'POS induction with distributional and morphological information using a distance-dependent Chinese Restaurant Process'. Together they form a unique fingerprint.

Cite this