Finding names in Trove

named entity recognition for Australian historical newspapers

Sunghwan Mac Kim, Steve Cassidy

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.
Original languageEnglish
Title of host publicationAustralasian Language Technology Association Workshop 2015
Subtitle of host publicationproceedings of the Workshop
EditorsBen Hachey, Kellie Webster
Place of PublicationMelbourne
PublisherAustralasian Language Technology Association
Pages57-65
Number of pages9
Volume13
Publication statusPublished - 2015
EventAustralasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW
Duration: 8 Dec 20159 Dec 2015

Workshop

WorkshopAustralasian Language Technology Association Workshop (13th : 2015)
CityParramatta, NSW
Period8/12/159/12/15

Cite this

Mac Kim, S., & Cassidy, S. (2015). Finding names in Trove: named entity recognition for Australian historical newspapers. In B. Hachey, & K. Webster (Eds.), Australasian Language Technology Association Workshop 2015: proceedings of the Workshop (Vol. 13, pp. 57-65). Melbourne: Australasian Language Technology Association.