Evidence-based information extraction for high accuracy citation and author name identification

Brett Powley, Robert Dale

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review


Citations play an essential role in navigating academic literature and following chains of evidence in research. With the growing availability of large digital archives of scientific papers, the automated extraction and analysis of citations is becoming increasingly relevant. However, existing approaches to citation extraction still fall short of the high accuracy required to build more sophisticated and reliable tools for citation analysis and corpus navigation. In this paper, we present techniques for high accuracy extraction of citations and references from academic papers. By collecting multiple sources of evidence about entities from documents, and integrating citation extraction, reference segmentation, and citation-reference matching, we are able to significantly improve performance in subtasks including citation identification, author named entity recognition, and citation-reference matching. Applying our algorithm to previously-unseen documents, we demonstrate high F-measure performance of 0.980 for citation extraction, 0.983 for author named entity recognition, and 0.948 for citation-reference matching.
Original languageEnglish
Title of host publicationRIAO '07
Subtitle of host publicationLarge Scale Semantic Access to Content (Text, Image, Video, and Sound)
Place of PublicationParis, France
Number of pages15
Publication statusPublished - 2007
EventRIAO Conference (8th : 2007) - Pittsburgh, PA
Duration: 30 May 20071 Jun 2007


ConferenceRIAO Conference (8th : 2007)
CityPittsburgh, PA


Dive into the research topics of 'Evidence-based information extraction for high accuracy citation and author name identification'. Together they form a unique fingerprint.

Cite this