A supervised machine learning approach to conjunction disambiguation in named entities

Pawel Mazur, Robert Dale

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Although the literature contains reports of very high accuracy figures for the recognition of named entities in text, there are still some named entity phenomena that remain problematic for existing text processing systems. One of these is the ambiguity of conjunctions in candidate named entity strings, an all-too-prevalent problem in corporate and legal documents. In this paper, we distinguish four uses of the conjunction in these strings, and explore the use of a supervised machine learning approach to conjunction disambiguation trained on a very limited set of 'name internal' features that avoids the need for expensive lexical or semantic resources. We achieve 84% correctly classified examples using k-fold evaluation on a data set of 600 instances. We argue that further improvements are likely to require the use of wider domain knowledge and name external features.

Original languageEnglish
Title of host publicationProceedings of IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007
EditorsCraig Knoblock, Daniel Lopresti, Shourya Roy, L. V. Subramaniam
Place of PublicationNew York
PublisherACM
Pages107-114
Number of pages8
Publication statusPublished - 2007
EventIJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007 - Hyderabad, India
Duration: 8 Jan 20078 Jan 2007

Other

OtherIJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007
CountryIndia
CityHyderabad
Period8/01/078/01/07

Fingerprint

Dive into the research topics of 'A supervised machine learning approach to conjunction disambiguation in named entities'. Together they form a unique fingerprint.

Cite this