Increasing SVC inference speed #193

KaikeWesleyReis · 2024-05-31T20:15:47Z

Hi,
I'm developing a personal project of a conversational chatbot. The idea is quite simple: Have a chat with Harbinger, the first reaper (from mass effect series). I found a optimal solution to generate his voice through text: Using a vits-ljspeech-base from Coqui TTS (without any fine tuning) to generate a audio and use your SVC fine tuned to add the voice over the generated audio. For example, given this sentence:

Organic intellect, fascinated by the patterns of the universe. I, Harbinger, have witnessed the harmony of numbers governing the cosmos. The intricate dance of primes, the elegance of elliptic curves, and the recursion of Fibonacci's sequence all resonate with my being. Which aspect of number theory would you like to dissect, researcher?

I have this time for each step to :

Now I'm studying the inference code of your model and so far I have the following ideas:

Whisper tiny instead of large

It's possible to cut or pre-generate any vector to reduce other models inference (whispper, hubert, pitch and so on) and thus, svc inference time?

Btw, thanks for your repository: is the easiest for "prepare your data and run" that I got so far in deep learning field.

Cheers from Brazil,

ShadowLoveElysia · 2024-06-02T06:25:05Z

Hey, I understand your thinking, and what you're doing is totally fine. But I have to give you a reality check. Yes, SVC can be used for audio replacement, but it seems like you're over-engineering it. You could just use TTS projects like GPT-Sovits instead of converting an existing TTS.

ShadowLoveElysia · 2024-06-02T06:26:14Z

If you want to do voice conversion, then this project is definitely fine, but if your goal is just TTS, then GPT-Sovits is sufficient.

KaikeWesleyReis · 2024-06-02T15:03:37Z

@ShadowLoveElysia

If you want to do voice conversion, then this project is definitely fine, but if your goal is just TTS, then GPT-Sovits is sufficient.

GPT-Sovits have the same idea of VITS fine tuning that I have done? Don't you think that I'll fall in the same mistakes of VITS fine tuning?

My voice is this: https://www.youtube.com/watch?v=YZt6NKrkdzQ&

Given the voice nature, do you believe that is possible to fine tune GPT-Sovits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing SVC inference speed #193

Increasing SVC inference speed #193

KaikeWesleyReis commented May 31, 2024

ShadowLoveElysia commented Jun 2, 2024

ShadowLoveElysia commented Jun 2, 2024

KaikeWesleyReis commented Jun 2, 2024 •

edited

Loading

Increasing SVC inference speed #193

Increasing SVC inference speed #193

Comments

KaikeWesleyReis commented May 31, 2024

ShadowLoveElysia commented Jun 2, 2024

ShadowLoveElysia commented Jun 2, 2024

KaikeWesleyReis commented Jun 2, 2024 • edited Loading

KaikeWesleyReis commented Jun 2, 2024 •

edited

Loading