Abstract
Realtime MRI provides useful data about the human vocal tract, but also introduces many of the challenges of processing high-dimensional image data. Intuitively, data reduction would proceed by finding the air-tissue boundaries in the images, and tracing an outline of the vocal tract. This approach is anatomically well-founded. We explore an alternative approach which is data-driven and has a complementary set of advantages. Our method directly examines pixel intensities. By analyzing how the pixels co-vary over time, we segment the image into spatially localized regions, in which the pixels are highly correlated with each other. Intensity variations in these correlated regions correspond to vocal tract constrictions, which are meaningful units of speech production. We show how these regions can be extracted entirely automatically, or with manual guidance. We present two examples and discuss its merits, including the opportunity to do direct data-driven time series modeling.
Original language | English |
---|---|
Title of host publication | Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 |
Place of Publication | Baixas, France |
Publisher | International Speech Communication Association |
Pages | 1572-1575 |
Number of pages | 4 |
ISBN (Print) | 9781617821233 |
Publication status | Published - 2010 |
Externally published | Yes |
Event | 11th Annual Conference of the International-Speech-Communication-Association 2010 - Makuhari, Japan Duration: 26 Sept 2010 → 30 Sept 2010 |
Conference
Conference | 11th Annual Conference of the International-Speech-Communication-Association 2010 |
---|---|
Country/Territory | Japan |
City | Makuhari |
Period | 26/09/10 → 30/09/10 |