Segmenting email message text into zones

Andrew Lampert*, Robert Dale, Cécile Paris

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

14 Citations (Scopus)

Abstract

In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their functional parts. Today, the explosion of different email formats and styles, coupled with the ad hoc ways in which people vary the structure and layout of their messages, means that simple techniques for identifying quoted replies that used to yield 95% accuracy now find less than 10% of such content. In this paper, we describe Zebra, an SVM-based system for segmenting the body text of email messages into nine zone types based on graphic, orthographic and lexical cues. Zebra performs this task with an accuracy of 87.01%; when the number of zones is abstracted to two or three zone classes, this increases to 93.60% and 91.53% respectively.

Original languageEnglish
Title of host publicationProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, EMNLP 2009
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages919-928
Number of pages10
ISBN (Print)9781932432626
DOIs
Publication statusPublished - 2009
Event2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009 - Singapore, Singapore
Duration: 6 Aug 20097 Aug 2009

Other

Other2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009
CountrySingapore
CitySingapore
Period6/08/097/08/09

Fingerprint Dive into the research topics of 'Segmenting email message text into zones'. Together they form a unique fingerprint.

Cite this