Finding names in Trove

named entity recognition for Australian historical newspapers

Sunghwan Mac Kim, Steve Cassidy

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution


Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.
Original languageEnglish
Title of host publicationAustralasian Language Technology Association Workshop 2015
Subtitle of host publicationproceedings of the Workshop
EditorsBen Hachey, Kellie Webster
Place of PublicationMelbourne
PublisherAustralasian Language Technology Association
Number of pages9
Publication statusPublished - 2015
EventAustralasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW
Duration: 8 Dec 20159 Dec 2015


WorkshopAustralasian Language Technology Association Workshop (13th : 2015)
CityParramatta, NSW

Cite this