Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement missing speed functions along with durable speech rate / speed changer function. #239

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

isikhi
Copy link

@isikhi isikhi commented Dec 29, 2024

As I can see coqui tts repo contuniues on this repo. So i want to add same pr into this one. Thanks! Here is the details:

Added missing speed parameters to functions and ensured more durable, accurate speed adjustments with the new adjust_speech_rate function.

Base Repo Ref: coqui-ai#4115

Copy link
Member

@eginhard eginhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and sorry for the slow response! I would suggest a slightly different approach to support setting the speed for XTTS. Coqui has a bunch of different models where speed can be modified in different ways and for now I don't want to commit to a single method in api.py.

What will also work is to remove any reference to speed from api.py. If a speed argument is then specified by the user, it will then be automatically passed to models that can handle it via the **kwargs. Would you like to make these changes? Otherwise I can take care of it at some point.

@isikhi
Copy link
Author

isikhi commented Jan 19, 2025

@eginhard thank you for the feedback! i am not entirely familiar with all the Coqui TTS models, but I guess I understand your suggestion. But i want to clarify again. are you recommending that instead of passing speed directly, I should rely on **kwargs to handle it dynamically?

For example, would this be closer to what you have in mind?

wav = self.tts(  
    text=text,  
    speaker=speaker,  
    language=language,  
    speaker_wav=speaker_wav,  
    split_sentences=split_sentences,  
    **kwargs,  # `speed` would be included here if specified by the user  and check in the body of function
)  

i guess with this way, any user-specified speed would automatically propagate to models that support it, without hardcoding it in api.py. please let me know if this aligns with your comment or if there’s another approach you’d prefer.

i am not advanced in python and coqui but i am happy to make further adjustments if i can do with your guidence.

@eginhard
Copy link
Member

Exactly. What's currently blocking speed to be passed through the kwargs is that some functions in api.py have it as an explicit argument (although it's not actually used then), so after removing those it should just work.

@isikhi
Copy link
Author

isikhi commented Jan 20, 2025

I was going to change and remove them from directly but then i realize for users whose version numbers are not strictly fixed, systems that call any function containing speed and use it in its current form might encounter issues (since there are no private accessors defined, making every function accessible). To avoid a BC (backward compatibility) break, I believe it would be more appropriate to leave speed as it is. However, if a BC break is not a concern and we are planning to release a new major version, I can proceed with the changes.

@eginhard
Copy link
Member

Well before you couldn't set speed and after that change you can, so that is an improvement and doesn't really break anything :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants