Composing spoken hints for follow-on question suggestion in voice assistants

Pedro Faustini, Besnik Fetahu, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review


The adoption of voice assistants like Alexa or Siri has grown rapidly, allowing users instant access to information via voice search. Query suggestion is a standard feature of screen-based search experiences, allowing users to explore additional topics. However, this is not trivial to implement in voice-based settings. To enable this, we tackle the novel task of suggesting questions with compact and natural voice hints to allow users to ask follow-up questions. We first define the task of composing speech-based hints, ground it in syntactic theory, and outline linguistic desiderata for spoken hints. We propose a sequence-to-sequence approach to generate spoken hints from a list of questions. Using a new dataset of 6, 681 input questions and human written hints, we evaluate models with automatic metrics and human evaluation. Results show that a naive approach of concatenating suggested questions creates poor voice hints. Our most sophisticated approach applies a linguistically-motivated pretraining task and was strongly preferred by humans for producing the most natural hints.

Original languageEnglish
Title of host publicationINTERSPEECH 2023
Place of PublicationFrance
PublisherInternational Speech Communication Association (ISCA)
Number of pages5
Publication statusPublished - 2023
EventINTERSPEECH Conference (24th : 2023) - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023
Conference number: 24th

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association (ISCA)
ISSN (Print)2308-457X


ConferenceINTERSPEECH Conference (24th : 2023)
Abbreviated titleINTERSPEECH 2023


Dive into the research topics of 'Composing spoken hints for follow-on question suggestion in voice assistants'. Together they form a unique fingerprint.

Cite this