High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers

Brett Powley*, Robert Dale

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

7 Citations (Scopus)
5 Downloads (Pure)

Abstract

Citation indices are increasingly being used not only as navigational tools for researchers, but also as the basis for measurement of academic performance and research impact. This means that the reliability of tools used to extract citations and construct such indices is becoming more critical; however, existing approaches to citation extraction still fall short of the high accuracy required if critical assessments are to be based on them. In this paper, we present techniques for high accuracy extraction of citations from academic papers, designed for applicability across a broad range of disciplines and document styles. We integrate citation extraction, reference parsing, and author named entity recognition to designificantly improve performance in citation extraction, and demonstrate this performance on a cross-disciplinary heterogeneous corpus. Applying our algorithm to previously unseen documents, we demonstrate high F-measure performance of 0.98 for author named entity recognition and 0.97 for citation extraction.

Original languageEnglish
Title of host publicationIEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages119-124
Number of pages6
ISBN (Print)9781424416103
DOIs
Publication statusPublished - 2007
EventInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007 - Beijing, China
Duration: 30 Aug 20071 Sep 2007

Other

OtherInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007
CountryChina
CityBeijing
Period30/08/071/09/07

Bibliographical note

Copyright 2007 IEEE. Reprinted from Proceedings of 2007 IEEE international conference on natural language processing and knowledge engineering (NLP-KE'07). This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Macquarie University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Fingerprint Dive into the research topics of 'High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers'. Together they form a unique fingerprint.

Cite this