You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm developing a personal project of a conversational chatbot. The idea is quite simple: Have a chat with Harbinger, the first reaper (from mass effect series). I found a optimal solution to generate his voice through text: Using a vits-ljspeech-base from Coqui TTS (without any fine tuning) to generate a audio and use your SVC fine tuned to add the voice over the generated audio. For example, given this sentence:
Organic intellect, fascinated by the patterns of the universe. I, Harbinger, have witnessed the harmony of numbers governing the cosmos. The intricate dance of primes, the elegance of elliptic curves, and the recursion of Fibonacci's sequence all resonate with my being. Which aspect of number theory would you like to dissect, researcher?
I have this time for each step to :
Now I'm studying the inference code of your model and so far I have the following ideas:
Whisper tiny instead of large
It's possible to cut or pre-generate any vector to reduce other models inference (whispper, hubert, pitch and so on) and thus, svc inference time?
Btw, thanks for your repository: is the easiest for "prepare your data and run" that I got so far in deep learning field.
Cheers from Brazil,
The text was updated successfully, but these errors were encountered:
Hey, I understand your thinking, and what you're doing is totally fine. But I have to give you a reality check. Yes, SVC can be used for audio replacement, but it seems like you're over-engineering it. You could just use TTS projects like GPT-Sovits instead of converting an existing TTS.
Hi,
I'm developing a personal project of a conversational chatbot. The idea is quite simple: Have a chat with Harbinger, the first reaper (from mass effect series). I found a optimal solution to generate his voice through text: Using a vits-ljspeech-base from Coqui TTS (without any fine tuning) to generate a audio and use your SVC fine tuned to add the voice over the generated audio. For example, given this sentence:
Organic intellect, fascinated by the patterns of the universe. I, Harbinger, have witnessed the harmony of numbers governing the cosmos. The intricate dance of primes, the elegance of elliptic curves, and the recursion of Fibonacci's sequence all resonate with my being. Which aspect of number theory would you like to dissect, researcher?
I have this time for each step to :
Now I'm studying the inference code of your model and so far I have the following ideas:
It's possible to cut or pre-generate any vector to reduce other models inference (whispper, hubert, pitch and so on) and thus, svc inference time?
Btw, thanks for your repository: is the easiest for "prepare your data and run" that I got so far in deep learning field.
Cheers from Brazil,
The text was updated successfully, but these errors were encountered: