ConvoCache: smart re-use of chatbot responses

Conor Atkins, Ian Wood, Mohamed Ali Kaafar, Hassan Asghar, Nardine Basta, Michal Kepkowski

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

We present ConvoCache, a conversational caching system that solves the problem of slow and expensive generative AI models in spoken chatbots. ConvoCache finds a semantically similar prompt in the past and reuses the response. In this paper we evaluate ConvoCache on the DailyDialog dataset. We find that ConvoCache can apply a UniEval coherence threshold of 90% and respond to 89% of prompts using the cache with an average latency of 214ms, replacing LLM and voice synthesis that can take over 1s. To further reduce latency we test prefetching and find limited usefulness. Prefetching with 80% of a request leads to a 63% hit rate, and a drop in overall coherence. ConvoCache can be used with any chatbot to reduce costs by reducing usage of generative AI by up to 89%.

Original languageEnglish
Title of host publicationInterspeech 2024
Place of PublicationOnline
PublisherInternational Speech Communication Association (ISCA)
Pages2950-2954
Number of pages5
DOIs
Publication statusPublished - 2024
EventInterspeech Conferece (25th : 2024) - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024

Publication series

Name
ISSN (Electronic)2958-1796

Conference

ConferenceInterspeech Conferece (25th : 2024)
Country/TerritoryGreece
CityKos Island
Period1/09/245/09/24

Cite this