Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.
|Title of host publication||Australasian Language Technology Association Workshop 2015|
|Subtitle of host publication||proceedings of the Workshop|
|Editors||Ben Hachey, Kellie Webster|
|Place of Publication||Melbourne|
|Publisher||Australasian Language Technology Association|
|Number of pages||9|
|Publication status||Published - 2015|
|Event||Australasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW|
Duration: 8 Dec 2015 → 9 Dec 2015
|Workshop||Australasian Language Technology Association Workshop (13th : 2015)|
|Period||8/12/15 → 9/12/15|
Mac Kim, S., & Cassidy, S. (2015). Finding names in Trove: named entity recognition for Australian historical newspapers. In B. Hachey, & K. Webster (Eds.), Australasian Language Technology Association Workshop 2015: proceedings of the Workshop (Vol. 13, pp. 57-65). Melbourne: Australasian Language Technology Association.