TY - GEN
T1 - High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers
AU - Powley, Brett
AU - Dale, Robert
N1 - Copyright 2007 IEEE. Reprinted from Proceedings of 2007 IEEE international conference on natural language processing and knowledge engineering (NLP-KE'07). This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Macquarie University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
PY - 2007
Y1 - 2007
N2 - Citation indices are increasingly being used not only as navigational tools for researchers, but also as the basis for measurement of academic performance and research impact. This means that the reliability of tools used to extract citations and construct such indices is becoming more critical; however, existing approaches to citation extraction still fall short of the high accuracy required if critical assessments are to be based on them. In this paper, we present techniques for high accuracy extraction of citations from academic papers, designed for applicability across a broad range of disciplines and document styles. We integrate citation extraction, reference parsing, and author named entity recognition to designificantly improve performance in citation extraction, and demonstrate this performance on a cross-disciplinary heterogeneous corpus. Applying our algorithm to previously unseen documents, we demonstrate high F-measure performance of 0.98 for author named entity recognition and 0.97 for citation extraction.
AB - Citation indices are increasingly being used not only as navigational tools for researchers, but also as the basis for measurement of academic performance and research impact. This means that the reliability of tools used to extract citations and construct such indices is becoming more critical; however, existing approaches to citation extraction still fall short of the high accuracy required if critical assessments are to be based on them. In this paper, we present techniques for high accuracy extraction of citations from academic papers, designed for applicability across a broad range of disciplines and document styles. We integrate citation extraction, reference parsing, and author named entity recognition to designificantly improve performance in citation extraction, and demonstrate this performance on a cross-disciplinary heterogeneous corpus. Applying our algorithm to previously unseen documents, we demonstrate high F-measure performance of 0.98 for author named entity recognition and 0.97 for citation extraction.
UR - http://www.scopus.com/inward/record.url?scp=47749118353&partnerID=8YFLogxK
U2 - 10.1109/NLPKE.2007.4368021
DO - 10.1109/NLPKE.2007.4368021
M3 - Conference proceeding contribution
AN - SCOPUS:47749118353
SN - 9781424416103
SP - 119
EP - 124
BT - IEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
T2 - International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007
Y2 - 30 August 2007 through 1 September 2007
ER -