Dynamic and Target Theories of Vowel Classification: Evidence from Monophthongs and Diphthongs in Australian English

Jonathan Harrington, Stephen Cassidy

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Recent studies on the perception of speech have suggested that vowel identification depends on dynamic cues, rather than a single ‘static’ spectral slice at the vowel midpoint. The experiments reported in this paper seek both to test the extent to which vowel recognition depends on dynamic information, and to identify the nature of the dynamic cues on which such recognition might depend. Gaussian classification techniques, as well as different kinds of neural network architectures, were used to classify some 3000 vowels in /CVd/ citation-form Australian English words, following training on roughly the same number of vowel tokens produced by different talkers. The first set of experiments shows that when vowels are classified from three spectral slices taken at the vowel margins and midpoint, only diphthongs, but not monophthongs, benefit from the additional spectral information at the vowel margins. A further experiment shows that vowels are no better classified from a time-delay neural network than from the three-slice network in which time is not explicitly represented. At least for the citation-form, Australian English vowels in this study, these results are interpreted as being more consistent with a target, rather than a dynamic, theory of vowel perception.

LanguageEnglish
Pages357-373
Number of pages17
JournalLanguage and speech
Volume37
Issue number4
DOIs
Publication statusPublished - 1994

Fingerprint

Cues
neural network
Speech Perception
experiment
evidence
Recognition (Psychology)
Australian English
Monophthongs
Experiment
Spectrality
time
Citation Form
Neural Networks

Cite this

@article{d853fd57a8ba432f9cbe1181d4f1c450,
title = "Dynamic and Target Theories of Vowel Classification: Evidence from Monophthongs and Diphthongs in Australian English",
abstract = "Recent studies on the perception of speech have suggested that vowel identification depends on dynamic cues, rather than a single ‘static’ spectral slice at the vowel midpoint. The experiments reported in this paper seek both to test the extent to which vowel recognition depends on dynamic information, and to identify the nature of the dynamic cues on which such recognition might depend. Gaussian classification techniques, as well as different kinds of neural network architectures, were used to classify some 3000 vowels in /CVd/ citation-form Australian English words, following training on roughly the same number of vowel tokens produced by different talkers. The first set of experiments shows that when vowels are classified from three spectral slices taken at the vowel margins and midpoint, only diphthongs, but not monophthongs, benefit from the additional spectral information at the vowel margins. A further experiment shows that vowels are no better classified from a time-delay neural network than from the three-slice network in which time is not explicitly represented. At least for the citation-form, Australian English vowels in this study, these results are interpreted as being more consistent with a target, rather than a dynamic, theory of vowel perception.",
keywords = "Australian English, diphthongs, neural networks, Vowel classification",
author = "Jonathan Harrington and Stephen Cassidy",
year = "1994",
doi = "10.1177/002383099403700402",
language = "English",
volume = "37",
pages = "357--373",
journal = "Language and speech",
issn = "0023-8309",
publisher = "Kingston Press Services",
number = "4",

}

Dynamic and Target Theories of Vowel Classification : Evidence from Monophthongs and Diphthongs in Australian English. / Harrington, Jonathan; Cassidy, Stephen.

In: Language and speech, Vol. 37, No. 4, 1994, p. 357-373.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Dynamic and Target Theories of Vowel Classification

T2 - Language and speech

AU - Harrington,Jonathan

AU - Cassidy,Stephen

PY - 1994

Y1 - 1994

N2 - Recent studies on the perception of speech have suggested that vowel identification depends on dynamic cues, rather than a single ‘static’ spectral slice at the vowel midpoint. The experiments reported in this paper seek both to test the extent to which vowel recognition depends on dynamic information, and to identify the nature of the dynamic cues on which such recognition might depend. Gaussian classification techniques, as well as different kinds of neural network architectures, were used to classify some 3000 vowels in /CVd/ citation-form Australian English words, following training on roughly the same number of vowel tokens produced by different talkers. The first set of experiments shows that when vowels are classified from three spectral slices taken at the vowel margins and midpoint, only diphthongs, but not monophthongs, benefit from the additional spectral information at the vowel margins. A further experiment shows that vowels are no better classified from a time-delay neural network than from the three-slice network in which time is not explicitly represented. At least for the citation-form, Australian English vowels in this study, these results are interpreted as being more consistent with a target, rather than a dynamic, theory of vowel perception.

AB - Recent studies on the perception of speech have suggested that vowel identification depends on dynamic cues, rather than a single ‘static’ spectral slice at the vowel midpoint. The experiments reported in this paper seek both to test the extent to which vowel recognition depends on dynamic information, and to identify the nature of the dynamic cues on which such recognition might depend. Gaussian classification techniques, as well as different kinds of neural network architectures, were used to classify some 3000 vowels in /CVd/ citation-form Australian English words, following training on roughly the same number of vowel tokens produced by different talkers. The first set of experiments shows that when vowels are classified from three spectral slices taken at the vowel margins and midpoint, only diphthongs, but not monophthongs, benefit from the additional spectral information at the vowel margins. A further experiment shows that vowels are no better classified from a time-delay neural network than from the three-slice network in which time is not explicitly represented. At least for the citation-form, Australian English vowels in this study, these results are interpreted as being more consistent with a target, rather than a dynamic, theory of vowel perception.

KW - Australian English

KW - diphthongs

KW - neural networks

KW - Vowel classification

UR - http://www.scopus.com/inward/record.url?scp=84970242392&partnerID=8YFLogxK

U2 - 10.1177/002383099403700402

DO - 10.1177/002383099403700402

M3 - Article

VL - 37

SP - 357

EP - 373

JO - Language and speech

JF - Language and speech

SN - 0023-8309

IS - 4

ER -