Named entity recognition in question answering of speech data

Diego Mollá, Menno van Zaanen, Steve Cassidy

Research output: Contribution to journalConference paperResearchpeer-review

Abstract

Question answering on speech transcripts (QAst) is a pilot track of the CLEF competition. In this paper we present our contribution to QAst, which is centred on a study of Named Entity (NE) recognition on speech transcripts, and how it impacts on the accuracy of the final question answering system. We have ported AFNER, the NE recogniser of the AnswerFinder questionanswering project, to the set of answer types expected in the QAst track. AFNER uses a combination of regular expressions, lists of names (gazetteers) and machine learning to find NeWS in the data. The machine learning component was trained on a development set of the AMI corpus. In the process we identified various problems with scalability of the system and the existence of errors of the extracted annotation, which lead to relatively poor performance in general. Performance was yet comparable with state of the art, and the system was second (out of three participants) in one of the QAst subtasks.

Fingerprint

Learning systems
Scalability

Cite this

@article{cefcbc6db3eb43c99857e4b52ce6730e,
title = "Named entity recognition in question answering of speech data",
abstract = "Question answering on speech transcripts (QAst) is a pilot track of the CLEF competition. In this paper we present our contribution to QAst, which is centred on a study of Named Entity (NE) recognition on speech transcripts, and how it impacts on the accuracy of the final question answering system. We have ported AFNER, the NE recogniser of the AnswerFinder questionanswering project, to the set of answer types expected in the QAst track. AFNER uses a combination of regular expressions, lists of names (gazetteers) and machine learning to find NeWS in the data. The machine learning component was trained on a development set of the AMI corpus. In the process we identified various problems with scalability of the system and the existence of errors of the extracted annotation, which lead to relatively poor performance in general. Performance was yet comparable with state of the art, and the system was second (out of three participants) in one of the QAst subtasks.",
keywords = "named entity recognition, question answering, speech processing, text processing",
author = "Diego Moll{\'a} and {van Zaanen}, Menno and Steve Cassidy",
year = "2007",
language = "English",
pages = "57--65",
journal = "Proceedings of the 2007 Australasian Language Technology Workshop",
issn = "1834-7037",
publisher = "ALTA",

}

Named entity recognition in question answering of speech data. / Mollá, Diego; van Zaanen, Menno; Cassidy, Steve.

In: Proceedings of the 2007 Australasian Language Technology Workshop, 2007, p. 57-65.

Research output: Contribution to journalConference paperResearchpeer-review

TY - JOUR

T1 - Named entity recognition in question answering of speech data

AU - Mollá,Diego

AU - van Zaanen,Menno

AU - Cassidy,Steve

PY - 2007

Y1 - 2007

N2 - Question answering on speech transcripts (QAst) is a pilot track of the CLEF competition. In this paper we present our contribution to QAst, which is centred on a study of Named Entity (NE) recognition on speech transcripts, and how it impacts on the accuracy of the final question answering system. We have ported AFNER, the NE recogniser of the AnswerFinder questionanswering project, to the set of answer types expected in the QAst track. AFNER uses a combination of regular expressions, lists of names (gazetteers) and machine learning to find NeWS in the data. The machine learning component was trained on a development set of the AMI corpus. In the process we identified various problems with scalability of the system and the existence of errors of the extracted annotation, which lead to relatively poor performance in general. Performance was yet comparable with state of the art, and the system was second (out of three participants) in one of the QAst subtasks.

AB - Question answering on speech transcripts (QAst) is a pilot track of the CLEF competition. In this paper we present our contribution to QAst, which is centred on a study of Named Entity (NE) recognition on speech transcripts, and how it impacts on the accuracy of the final question answering system. We have ported AFNER, the NE recogniser of the AnswerFinder questionanswering project, to the set of answer types expected in the QAst track. AFNER uses a combination of regular expressions, lists of names (gazetteers) and machine learning to find NeWS in the data. The machine learning component was trained on a development set of the AMI corpus. In the process we identified various problems with scalability of the system and the existence of errors of the extracted annotation, which lead to relatively poor performance in general. Performance was yet comparable with state of the art, and the system was second (out of three participants) in one of the QAst subtasks.

KW - named entity recognition

KW - question answering

KW - speech processing

KW - text processing

M3 - Conference paper

SP - 57

EP - 65

JO - Proceedings of the 2007 Australasian Language Technology Workshop

T2 - Proceedings of the 2007 Australasian Language Technology Workshop

JF - Proceedings of the 2007 Australasian Language Technology Workshop

SN - 1834-7037

ER -