A systematic review and comparative analysis of cross-document coreference resolution methods and tools

Seyed Mehdi Reza Beheshti*, Boualem Benatallah, Srikumar Venugopal, Seung Hwan Ryu, Hamid Reza Motahari-Nezhad, Wei Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)

Abstract

Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreference resolution (CDCR) - the task of identifying entity mentions across information sources that refer to the same underlying entity. CDCR is the basis of knowledge acquisition and is at the heart of Web search, recommendations, and analytics. Real time processing of CDCR processes is very important and have various applications in discovering must-know information in real-time for clients in finance, public sector, news, and crisis management. Being an emerging area of research and practice, the reported literature on CDCR challenges and solutions is growing fast but is scattered due to the large space, various applications, and large datasets of the order of peta-/tera-bytes. In order to fill this gap, we provide a systematic review of the state of the art of challenges and solutions for a CDCR process. We identify a set of quality attributes, that have been frequently reported in the context of CDCR processes, to be used as a guide to identify important and outstanding issues for further investigations. Finally, we assess existing tools and techniques for CDCR subtasks and provide guidance on selection of tools and algorithms.

Original languageEnglish
Pages (from-to)313-349
Number of pages37
JournalComputing
Volume99
Issue number4
DOIs
Publication statusPublished - 1 Apr 2017
Externally publishedYes

Keywords

  • Cross-document coreference Resolution
  • Information extraction
  • Large datasets

Fingerprint Dive into the research topics of 'A systematic review and comparative analysis of cross-document coreference resolution methods and tools'. Together they form a unique fingerprint.

Cite this