TY - GEN
T1 - A model for detecting and merging vertically spanned table cells in plain text documents
AU - Long, Vanessa
AU - Dale, Robert
AU - Cassidy, Steve
N1 - Copyright 2005 IEEE. Reprinted from Eighth International Conference on Document Analysis and Recognition : proceedings : August 31 to September 1, 2005, Seoul, Korea. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Macquarie University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
PY - 2005/9
Y1 - 2005/9
N2 - A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significant cause of error in the extraction of tables from free text documents. In this paper, we present a model for the detection and merging of vertically spanned cells for tables presented in plain text documents. Our model and algorithm are based purely on the layout features of the tables, and they require no semantic understanding of the documents. When tested on the 98 tables appearing in 40 randomly selected documents from a corpus of company announcements from the Australian Stock Exchange (ASX), our algorithm achieves an accuracy of 86.79% in detecting and merging vertically spanned cells.
AB - A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a significant cause of error in the extraction of tables from free text documents. In this paper, we present a model for the detection and merging of vertically spanned cells for tables presented in plain text documents. Our model and algorithm are based purely on the layout features of the tables, and they require no semantic understanding of the documents. When tested on the 98 tables appearing in 40 randomly selected documents from a corpus of company announcements from the Australian Stock Exchange (ASX), our algorithm achieves an accuracy of 86.79% in detecting and merging vertically spanned cells.
UR - http://www.scopus.com/inward/record.url?scp=33947420202&partnerID=8YFLogxK
U2 - 10.1109/ICDAR.2005.21
DO - 10.1109/ICDAR.2005.21
M3 - Conference proceeding contribution
AN - SCOPUS:33947420202
SN - 0769524206
SN - 9780769524207
VL - 1
SP - 1242
EP - 1246
BT - Proceedings Eighth International Conference on Document Analysis and Recognition
A2 - Werner, Bob
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Los Alamitos, CA
T2 - 8th International Conference on Document Analysis and Recognition
Y2 - 31 August 2005 through 1 September 2005
ER -