Statistical models of syntax learning and use

Mark Johnson*, Stefan Riezler

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)


This paper shows how to define probability distributions over linguistically realistic syntactic structures in a way that permits us to define language learning and language comprehension as statistical problems. We demonstrate our approach using lexical-functional grammar (LFG), but our approach generalizes to virtually any linguistic theory. Our probabilistic models are maximum entropy models. In this paper we concentrate on statistical inference procedures for learning the parameters that define these probability distributions. We point out some of the practical problems that make straightforward ways of estimating these distributions infeasible, and develop a "pseudo-likelihood" estimation procedure that overcomes some of these problems. This method raises interesting questions concerning the nature of the data available to a language learner and the modularity of language learning and processing.

Original languageEnglish
Pages (from-to)239-253
Number of pages15
JournalCognitive Science
Issue number3
Publication statusPublished - 2002
Externally publishedYes


  • Discriminative parameter estimation
  • Maximum entropy modeling
  • Statistical language learning
  • Statistical parsing


Dive into the research topics of 'Statistical models of syntax learning and use'. Together they form a unique fingerprint.

Cite this