This paper shows how to define probability distributions over linguistically realistic syntactic structures in a way that permits us to define language learning and language comprehension as statistical problems. We demonstrate our approach using lexical-functional grammar (LFG), but our approach generalizes to virtually any linguistic theory. Our probabilistic models are maximum entropy models. In this paper we concentrate on statistical inference procedures for learning the parameters that define these probability distributions. We point out some of the practical problems that make straightforward ways of estimating these distributions infeasible, and develop a "pseudo-likelihood" estimation procedure that overcomes some of these problems. This method raises interesting questions concerning the nature of the data available to a language learner and the modularity of language learning and processing.
- Discriminative parameter estimation
- Maximum entropy modeling
- Statistical language learning
- Statistical parsing