Abstract
Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.
Original language | English |
---|---|
Title of host publication | Australasian Language Technology Association Workshop 2015 |
Subtitle of host publication | proceedings of the Workshop |
Editors | Ben Hachey, Kellie Webster |
Place of Publication | Melbourne |
Publisher | Australasian Language Technology Association |
Pages | 57-65 |
Number of pages | 9 |
Volume | 13 |
Publication status | Published - 2015 |
Event | Australasian Language Technology Association Workshop (13th : 2015) - Parramatta, NSW Duration: 8 Dec 2015 → 9 Dec 2015 |
Workshop
Workshop | Australasian Language Technology Association Workshop (13th : 2015) |
---|---|
City | Parramatta, NSW |
Period | 8/12/15 → 9/12/15 |