[Bug] Pre-computing takes so long #4131

AdrianPresno · 2025-01-17T12:21:05Z

Describe the bug

Hello,
I have a bit of issues when trying to train fastspeech2. I've checked task manager and it's only using ram. I've read that maybe precomputing only needs ram and doesn't want gpu but that seems odd to me. The gpu is available and updated to use CUDA but it doesn't seem to be taking advantage of it.
Im using the code.

I'm running it on a machine with a GPU 4050, 40Gb of ram. Thank you.

To Reproduce

import os
import torch
from trainer import Trainer, TrainerArgs
from TTS.config.shared_configs import BaseAudioConfig, BaseDatasetConfig
from TTS.tts.configs.fastspeech2_config import Fastspeech2Config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.forward_tts import ForwardTTS
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
from TTS.utils.manage import ModelManager
import subprocess

def main():
output_path = os.path.dirname(os.path.abspath(file))

# Configuración del dataset
dataset_config = BaseDatasetConfig(
    formatter="ljspeech",
    meta_file_train="metadata.csv",
    path=os.path.join(output_path, "dataset"),
)

# Verificar rutas
assert os.path.exists(dataset_config.path), f"Dataset path {dataset_config.path} does not exist."
assert os.path.exists(os.path.join(dataset_config.path, "metadata.csv")), "metadata.csv not found in dataset path."

# Configuración de audio
audio_config = BaseAudioConfig(
    sample_rate=22050,
    do_trim_silence=True,
    trim_db=60.0,
    signal_norm=False,
    mel_fmin=0.0,
    mel_fmax=8000,
    spec_gain=1.0,
    log_func="np.log",
    ref_level_db=20,
    preemphasis=0.0,
)

# Configuración del modelo
config = Fastspeech2Config(
    run_name="fastspeech2_ljspeech",
    audio=audio_config,
    batch_size=8,  # Tamaño de lote reducido
    eval_batch_size=8,  # Tamaño de lote de evaluación reducido
    eval_split_size=0.01,  # Procesar solo el 1% para pruebas rápidas

    num_loader_workers=4,  # Desactivar multiprocesamiento en la carga de datos
    num_eval_loader_workers=4,  # Desactivar multiprocesamiento en la carga de datos de evaluación
    compute_input_seq_cache=True,
    compute_f0=True,
    
    f0_cache_path=os.path.join(output_path, "cache", "pitch_stats.npy"),
    compute_energy=True,
    energy_cache_path=os.path.join(output_path, "energy_cache"),
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1000,
    text_cleaner="basic_cleaners",
    use_phonemes=False,

    phoneme_language="es",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    precompute_num_workers=0,  # Desactivar precomputación en paralelo
    print_step=50,
    print_eval=False,
    mixed_precision=True,
    max_seq_len=500000,
    output_path=output_path,
    datasets=[dataset_config],
)

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Usando dispositivo: {device}")

# Calcular alineaciones si es necesario
if not config.model_args.use_aligner:
    manager = ModelManager()
    model_path, config_path, _ = manager.download_model("tts_models/es/mai/tacotron2-DDC")
    print("Calculando alineaciones...")
    subprocess.run(
        [
            "python", "TTS/bin/compute_attention_masks.py",
            "--model_path", model_path,
            "--config_path", config_path,
            "--dataset", "ljspeech",
            "--dataset_metafile", "metadata.csv",
            "--data_path", dataset_config.path,
            "--use_cuda", "true"
        ],
        check=True
    )

# Inicializar el procesador de audio
print("Inicializando procesador de audio...")
ap = AudioProcessor.init_from_config(config)

# Inicializar el tokenizador
print("Inicializando tokenizador...")
tokenizer, config = TTSTokenizer.init_from_config(config)

# Cargar las muestras de datos
print("Cargando muestras de datos...")
train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=config.eval_split_size,
)
print(f"Datos cargados: {len(train_samples)} muestras de entrenamiento, {len(eval_samples)} de evaluación.")

# Inicializar el modelo
print("Inicializando modelo...")
model = ForwardTTS(config, ap, tokenizer, speaker_manager=None).to(device)

# Inicializar el entrenador y comenzar el entrenamiento
print("Inicializando entrenador...")
trainer = Trainer(
    TrainerArgs(), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples
)
print("Comenzando entrenamiento...")
trainer.fit()

if name == "main":
main()

Expected behavior

No response

Logs

Environment

Last TTS version 
1h sample datasets
4050 GTX
Windows 11 + linux virtual environment
Cuda drivers 10.1

Additional context

No response

The text was updated successfully, but these errors were encountered:

eginhard · 2025-01-18T15:05:11Z

Yes, the precomputation is done on CPU. You can adjust precompute_num_workers to run on multiple cores in parallel.

AdrianPresno · 2025-01-21T10:54:50Z

There must be an issue because when I set the parallel processing line, for example, precompute_num_workers=4, a "killed" error appears, and the precomputation does not start.

If I set precompute_num_workers=1, it takes an extremely long time. I'm not sure what the problem is, but I have a dataset of 600 WAV files (each less than 10 seconds long) in LJSpeech format, and I don't think it's normal for it to take this long.

eginhard · 2025-01-21T11:38:40Z

Not sure, maybe an issue with multi-processing on Windows?

AdrianPresno added the bug Something isn't working label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Pre-computing takes so long #4131

[Bug] Pre-computing takes so long #4131

AdrianPresno commented Jan 17, 2025

eginhard commented Jan 18, 2025

AdrianPresno commented Jan 21, 2025

eginhard commented Jan 21, 2025

[Bug] Pre-computing takes so long #4131

[Bug] Pre-computing takes so long #4131

Comments

AdrianPresno commented Jan 17, 2025

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

eginhard commented Jan 18, 2025

AdrianPresno commented Jan 21, 2025

eginhard commented Jan 21, 2025