A model for detecting and merging vertically spanned table cells in plain text documents

Vanessa Long*, Robert Dale, Steve Cassidy

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

4 Citations (Scopus)
42 Downloads (Pure)

Abstract

A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significant cause of error in the extraction of tables from free text documents. In this paper, we present a model for the detection and merging of vertically spanned cells for tables presented in plain text documents. Our model and algorithm are based purely on the layout features of the tables, and they require no semantic understanding of the documents. When tested on the 98 tables appearing in 40 randomly selected documents from a corpus of company announcements from the Australian Stock Exchange (ASX), our algorithm achieves an accuracy of 86.79% in detecting and merging vertically spanned cells.

Original languageEnglish
Title of host publicationProceedings Eighth International Conference on Document Analysis and Recognition
EditorsBob Werner
Place of PublicationLos Alamitos, CA
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1242-1246
Number of pages5
Volume1
ISBN (Print)0769524206, 9780769524207
DOIs
Publication statusPublished - Sept 2005
Event8th International Conference on Document Analysis and Recognition - Seoul, Korea, Republic of
Duration: 31 Aug 20051 Sept 2005

Other

Other8th International Conference on Document Analysis and Recognition
Country/TerritoryKorea, Republic of
CitySeoul
Period31/08/051/09/05

Bibliographical note

Copyright 2005 IEEE. Reprinted from Eighth International Conference on Document Analysis and Recognition : proceedings : August 31 to September 1, 2005, Seoul, Korea. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Macquarie University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Fingerprint

Dive into the research topics of 'A model for detecting and merging vertically spanned table cells in plain text documents'. Together they form a unique fingerprint.

Cite this