Abstract
Citations play an essential role in navigating academic literature and following chains of evidence in research. With the growing availability of large digital archives of scientific papers, the automated extraction and analysis of citations is becoming increasingly relevant. However, existing approaches to citation extraction still fall short of the high accuracy required to build more sophisticated and reliable tools for citation analysis and corpus navigation. In this paper, we present techniques for high accuracy extraction of citations and references from academic papers. By collecting multiple sources of evidence about entities from documents, and integrating citation extraction, reference segmentation, and citation-reference matching, we are able to significantly improve performance in subtasks including citation identification, author named entity recognition, and citation-reference matching. Applying our algorithm to previously-unseen documents, we demonstrate high F-measure performance of 0.980 for citation extraction, 0.983 for author named entity recognition, and 0.948 for citation-reference matching.
Original language | English |
---|---|
Title of host publication | RIAO '07 |
Subtitle of host publication | Large Scale Semantic Access to Content (Text, Image, Video, and Sound) |
Place of Publication | Paris, France |
Publisher | CID |
Pages | 618-632 |
Number of pages | 15 |
Publication status | Published - 2007 |
Event | RIAO Conference (8th : 2007) - Pittsburgh, PA Duration: 30 May 2007 → 1 Jun 2007 |
Conference
Conference | RIAO Conference (8th : 2007) |
---|---|
City | Pittsburgh, PA |
Period | 30/05/07 → 1/06/07 |