Information extraction via path merging

Robert Dale, Cecile Paris, Marc Tilbrook

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

1 Citation (Scopus)


In this paper, we describe a new approach to information extraction that neatly integrates top-down hypothesis driven information with bottom-up data driven information. The aim of the kelp project is to combine a variety of natural language processing techniques so that we can extract useful elements of information from a collection of documents and then re-present this information in a manner that is tailored to the needs of a specific user. Our focus here is on how we can build richly structured data objects by extracting information from web pages; as an example, we describe our methods in the context of extracting information from webp ages that describe laptop computers. Our approach, which we call path-merging, involves using relatively simple techniques for identifying what are normally referred to as named entities, then allowing more sophisticated and intelligent techniques to combine these elements of information: effectively, we view the text as providing a collection of jigsaw-piece-like elements of information which then have to be combined to produce a representation of the useful content of the document. A principle goal of this work is the separation of different components of the information extraction task so as to increase portability.

Original languageEnglish
Title of host publicationAI 2003: Advances in Artificial Intelligence - 16th Australian Conference on AI, Proceedings
EditorsThomas D., Gedeon Lance, Chun Che Fung
Place of PublicationBerlin; Heidelberg
PublisherSpringer, Springer Nature
Number of pages11
ISBN (Electronic)9783540245810
ISBN (Print)9783540206460
Publication statusPublished - Dec 2003
Event16th Australian Conference on Artificial Intelligence, AI - 2003 - Perth, Australia
Duration: 3 Dec 20035 Dec 2003

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)03029743
ISSN (Electronic)16113349


Other16th Australian Conference on Artificial Intelligence, AI - 2003


  • Natural language generation
  • Natural language understanding


Dive into the research topics of 'Information extraction via path merging'. Together they form a unique fingerprint.

Cite this