adding text chunker in 11labs #1013

Vaibhav159 · 2025-01-16T17:07:33Z

Changelog

Added text_chunker utility:
- Introduced a new utility to handle chunked text processing.
- Integrated text_chunker into ElevenLabsTTSService to support sending text word by word, as recommended in the ElevenLabs documentation.
- This change is a potential fix for issue ElevenLabs exception: sent 1009 (message too big) #983.

Motivation

The change is inspired by ElevenLabs' recommendation to use input streaming via WebSocket for text-to-speech applications. As per their documentation:

"For applications where the text prompts can be streamed to the text-to-speech endpoints (such as LLM output), this allows for prompts to be fed to the endpoint while the speech is being generated. You can also configure the streaming chunk size when using the WebSocket, with smaller chunks generally rendering faster. As such, we recommend sending content word by word. Our model and tooling leverage context to ensure that sentence structure and more are persisted in the generated audio, even if we only receive a word at a time."

This implementation ensures faster rendering and better alignment with ElevenLabs' best practices, while also addressing potential issues like #983.

Vaibhav159 · 2025-01-16T17:08:46Z

@markbackman can we review this in terms of #983 as well?

markbackman · 2025-01-16T17:38:26Z

@Vaibhav159 I don't think the 1009 issue is actually due to the messages being too large. I've tested ElevenLabs with extremely large messages and haven't seen issues. They support up to 40K token TTS generations without issues. This can be tested by pushing a TTSSpeakFrame with a very long text message.

Also, when the text is processed, there's already logic that splits on the sentence boundary.

I don't think further splitting is the way to go.

Also, we've talked with the 11Labs team about how the text should be sent. They recommended enabling auto_mode since we're sending full sentences to them. This change greatly improves the latency. Docs on auto_mode:

auto_mode
This parameter focuses on reducing the latency by disabling the chunk schedule and all buffers. It is only recommended when sending full sentences or phrases, sending partial phrases will result in highly reduced quality. By default it’s set to false.

Given all of this, I'm inclined to say that we shouldn't make this change.

WDYT?

Vaibhav159 · 2025-01-16T17:48:30Z

@markbackman makes sense, only case would be auto_mode as false where we might need it but considering that, if we are ending text sentence by sentence we are better off not complicating the send logic

markbackman · 2025-01-16T19:59:57Z

Thanks @Vaibhav159. If you are able to repro the 1009 websocket error, I'm very interested to know what that repro case is. I'm hopeful that the receive_task_handler improvements made in #962 will solve the issue.

adding text chunker in 11labs

eae39bf

Vaibhav159 closed this Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding text chunker in 11labs #1013

adding text chunker in 11labs #1013

Vaibhav159 commented Jan 16, 2025

Vaibhav159 commented Jan 16, 2025

markbackman commented Jan 16, 2025

Vaibhav159 commented Jan 16, 2025

markbackman commented Jan 16, 2025

adding text chunker in 11labs #1013

adding text chunker in 11labs #1013

Conversation

Vaibhav159 commented Jan 16, 2025

Changelog

Motivation

Vaibhav159 commented Jan 16, 2025

markbackman commented Jan 16, 2025

Vaibhav159 commented Jan 16, 2025

markbackman commented Jan 16, 2025