Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Pre-computing takes so long #4131

Open
AdrianPresno opened this issue Jan 17, 2025 · 3 comments
Open

[Bug] Pre-computing takes so long #4131

AdrianPresno opened this issue Jan 17, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@AdrianPresno
Copy link

Describe the bug

Hello,
I have a bit of issues when trying to train fastspeech2. I've checked task manager and it's only using ram. I've read that maybe precomputing only needs ram and doesn't want gpu but that seems odd to me. The gpu is available and updated to use CUDA but it doesn't seem to be taking advantage of it.
Im using the code.

I'm running it on a machine with a GPU 4050, 40Gb of ram. Thank you.

To Reproduce

import os
import torch
from trainer import Trainer, TrainerArgs
from TTS.config.shared_configs import BaseAudioConfig, BaseDatasetConfig
from TTS.tts.configs.fastspeech2_config import Fastspeech2Config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.forward_tts import ForwardTTS
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
from TTS.utils.manage import ModelManager
import subprocess

def main():
output_path = os.path.dirname(os.path.abspath(file))

# Configuración del dataset
dataset_config = BaseDatasetConfig(
    formatter="ljspeech",
    meta_file_train="metadata.csv",
    path=os.path.join(output_path, "dataset"),
)

# Verificar rutas
assert os.path.exists(dataset_config.path), f"Dataset path {dataset_config.path} does not exist."
assert os.path.exists(os.path.join(dataset_config.path, "metadata.csv")), "metadata.csv not found in dataset path."

# Configuración de audio
audio_config = BaseAudioConfig(
    sample_rate=22050,
    do_trim_silence=True,
    trim_db=60.0,
    signal_norm=False,
    mel_fmin=0.0,
    mel_fmax=8000,
    spec_gain=1.0,
    log_func="np.log",
    ref_level_db=20,
    preemphasis=0.0,
)

# Configuración del modelo
config = Fastspeech2Config(
    run_name="fastspeech2_ljspeech",
    audio=audio_config,
    batch_size=8,  # Tamaño de lote reducido
    eval_batch_size=8,  # Tamaño de lote de evaluación reducido
    eval_split_size=0.01,  # Procesar solo el 1% para pruebas rápidas

    num_loader_workers=4,  # Desactivar multiprocesamiento en la carga de datos
    num_eval_loader_workers=4,  # Desactivar multiprocesamiento en la carga de datos de evaluación
    compute_input_seq_cache=True,
    compute_f0=True,
    
    f0_cache_path=os.path.join(output_path, "cache", "pitch_stats.npy"),
    compute_energy=True,
    energy_cache_path=os.path.join(output_path, "energy_cache"),
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1000,
    text_cleaner="basic_cleaners",
    use_phonemes=False,

    phoneme_language="es",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    precompute_num_workers=0,  # Desactivar precomputación en paralelo
    print_step=50,
    print_eval=False,
    mixed_precision=True,
    max_seq_len=500000,
    output_path=output_path,
    datasets=[dataset_config],
)

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Usando dispositivo: {device}")

# Calcular alineaciones si es necesario
if not config.model_args.use_aligner:
    manager = ModelManager()
    model_path, config_path, _ = manager.download_model("tts_models/es/mai/tacotron2-DDC")
    print("Calculando alineaciones...")
    subprocess.run(
        [
            "python", "TTS/bin/compute_attention_masks.py",
            "--model_path", model_path,
            "--config_path", config_path,
            "--dataset", "ljspeech",
            "--dataset_metafile", "metadata.csv",
            "--data_path", dataset_config.path,
            "--use_cuda", "true"
        ],
        check=True
    )

# Inicializar el procesador de audio
print("Inicializando procesador de audio...")
ap = AudioProcessor.init_from_config(config)

# Inicializar el tokenizador
print("Inicializando tokenizador...")
tokenizer, config = TTSTokenizer.init_from_config(config)

# Cargar las muestras de datos
print("Cargando muestras de datos...")
train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=config.eval_split_size,
)
print(f"Datos cargados: {len(train_samples)} muestras de entrenamiento, {len(eval_samples)} de evaluación.")

# Inicializar el modelo
print("Inicializando modelo...")
model = ForwardTTS(config, ap, tokenizer, speaker_manager=None).to(device)

# Inicializar el entrenador y comenzar el entrenamiento
print("Inicializando entrenador...")
trainer = Trainer(
    TrainerArgs(), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples
)
print("Comenzando entrenamiento...")
trainer.fit()

if name == "main":
main()

Expected behavior

No response

Logs

Environment

Last TTS version 
1h sample datasets
4050 GTX
Windows 11 + linux virtual environment
Cuda drivers 10.1

Additional context

No response

@AdrianPresno AdrianPresno added the bug Something isn't working label Jan 17, 2025
@eginhard
Copy link
Contributor

Yes, the precomputation is done on CPU. You can adjust precompute_num_workers to run on multiple cores in parallel.

@AdrianPresno
Copy link
Author

There must be an issue because when I set the parallel processing line, for example, precompute_num_workers=4, a "killed" error appears, and the precomputation does not start.

Image

If I set precompute_num_workers=1, it takes an extremely long time. I'm not sure what the problem is, but I have a dataset of 600 WAV files (each less than 10 seconds long) in LJSpeech format, and I don't think it's normal for it to take this long.

Image

@eginhard
Copy link
Contributor

Not sure, maybe an issue with multi-processing on Windows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants