Finding names in Trove: named entity recognition for Australian historical newspapers

Sunghwan Mac Kim, Steve Cassidy

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.
LanguageEnglish
Title of host publicationAustralasian Language Technology Association Workshop 2015
Subtitle of host publicationproceedings of the Workshop
EditorsBen Hachey, Kellie Webster
Place of PublicationMelbourne
PublisherAustralasian Language Technology Association
Pages57-65
Number of pages9
Volume13
Publication statusPublished - 2015
EventAustralasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW
Duration: 8 Dec 20159 Dec 2015

Workshop

WorkshopAustralasian Language Technology Association Workshop (13th : 2015)
CityParramatta, NSW
Period8/12/159/12/15

Cite this

Mac Kim, S., & Cassidy, S. (2015). Finding names in Trove: named entity recognition for Australian historical newspapers. In B. Hachey, & K. Webster (Eds.), Australasian Language Technology Association Workshop 2015: proceedings of the Workshop (Vol. 13, pp. 57-65). Melbourne: Australasian Language Technology Association.
Mac Kim, Sunghwan ; Cassidy, Steve. / Finding names in Trove : named entity recognition for Australian historical newspapers. Australasian Language Technology Association Workshop 2015: proceedings of the Workshop. editor / Ben Hachey ; Kellie Webster. Vol. 13 Melbourne : Australasian Language Technology Association, 2015. pp. 57-65
@inproceedings{63e9b780d8174681b770cc5e3f91e66a,
title = "Finding names in Trove: named entity recognition for Australian historical newspapers",
abstract = "Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.",
author = "{Mac Kim}, Sunghwan and Steve Cassidy",
year = "2015",
language = "English",
volume = "13",
pages = "57--65",
editor = "Ben Hachey and Kellie Webster",
booktitle = "Australasian Language Technology Association Workshop 2015",
publisher = "Australasian Language Technology Association",

}

Mac Kim, S & Cassidy, S 2015, Finding names in Trove: named entity recognition for Australian historical newspapers. in B Hachey & K Webster (eds), Australasian Language Technology Association Workshop 2015: proceedings of the Workshop. vol. 13, Australasian Language Technology Association, Melbourne, pp. 57-65, Australasian Language Technology Association Workshop (13th : 2015), Parramatta, NSW, 8/12/15.

Finding names in Trove : named entity recognition for Australian historical newspapers. / Mac Kim, Sunghwan; Cassidy, Steve.

Australasian Language Technology Association Workshop 2015: proceedings of the Workshop. ed. / Ben Hachey; Kellie Webster. Vol. 13 Melbourne : Australasian Language Technology Association, 2015. p. 57-65.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - Finding names in Trove

T2 - named entity recognition for Australian historical newspapers

AU - Mac Kim,Sunghwan

AU - Cassidy,Steve

PY - 2015

Y1 - 2015

N2 - Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.

AB - Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.

UR - http://www.alta.asn.au/events/alta2015/

M3 - Conference proceeding contribution

VL - 13

SP - 57

EP - 65

BT - Australasian Language Technology Association Workshop 2015

PB - Australasian Language Technology Association

CY - Melbourne

ER -

Mac Kim S, Cassidy S. Finding names in Trove: named entity recognition for Australian historical newspapers. In Hachey B, Webster K, editors, Australasian Language Technology Association Workshop 2015: proceedings of the Workshop. Vol. 13. Melbourne: Australasian Language Technology Association. 2015. p. 57-65