Why loading data in two stages at training start ? #991

mbarnig · 2021-12-02T13:13:50Z

mbarnig
Dec 2, 2021

When training a model with a small dataset of N utterances, my data is loaded in two stages :

N-6 samples at the start,
followed by 6 samples a few moments later

The logs for a dataset of 640 utterances are shown below for the model Tacotron2-DCA, but the same happens with other models:

.....
| > hop_length:256
| > win_length:1024
| > Found 640 files in /home/mbarnig/COQUI-TTS/TTS/recipes/ljspeech/LJSpeech-1.1
> Using CUDA:  True
> Number of GPUs:  1

> Model has 47043138 parameters

> EPOCH: 0/1000
--> /home/mbarnig/COQUI-TTS/TTS/recipes/ljspeech/tacotron2-DCA/coqui_tts-December-02-2021_10+54AM-a0d483f3

> DataLoader initialization
| > Use phonemes: False
| > Number of instances : 634
| > Max length sequence: 234
| > Min length sequence: 22
| > Avg length sequence: 113.72712933753944
| > Num. instances discarded by max-min (max=inf, min=1) seq limits: 0
| > Batch group size: 0.

.....

 > DataLoader initialization
| > Use phonemes: False
| > Number of instances : 6
| > Max length sequence: 176
| > Min length sequence: 74
| > Avg length sequence: 120.0
| > Num. instances discarded by max-min (max=inf, min=1) seq limits: 0
| > Batch group size: 0.
.....

I don't understand why the loading is done in two stages. What is a batch group, what means size : 0 ? I tried to read the scripts related to the datasets and to draw a flow-chart, without success. I guess these are easy questions for the Coqui-TTS experts.
Thanks for your help.

Answered by erogol

Dec 13, 2021

(answered in the call)

View full answer

erogol · 2021-12-13T09:12:58Z

erogol
Dec 13, 2021
Maintainer

(answered in the call)

0 replies

erogol · 2021-12-13T09:14:38Z

erogol
Dec 13, 2021
Maintainer

I might have answered the wrong thing. Now that I see the logs you posted, I guess you were confused about seeing the same logs with different values.

The reason is that we load 2 data loaders, one for training and one for validation. That is why you see the same set of logs with different values.

0 replies

mbarnig · 2021-12-15T14:53:16Z

mbarnig
Dec 15, 2021
Author

@erogol : thank you for your answer which crossed my recent trials. I am pleased to recapitulate hereafter the dataloading workflow in detail, for my own needs and for interested users.

Three dataset files are involved in the learning process :

training list
validation list
test list

The samples in the three lists must be different.

The test list is easy. It is specified with the parameter test-sentences or with the parameter test_sentences_file. During training the test sentences are synthesized at the end of each epoch and you can listen to the generated audio-files in Tensorboard.

If no validation file is specified in the configuration, the file specified for training in the configuration (usually named metadata.csv) with N samples is splitted as follows:

N * 0.01 validation samples
N - (N * 0.01) training samples

(see lines 20 and 21 in script TTS/TTS/tts/datasets/__init__.py :

20 : eval_split_size = min(500, int(len(items) * 0.01))
21 : assert eval_split_size > 0, " [!] You do not have enough samples to train. You need at least 100 samples."

This code explains why I observed for my small dataset of 640 samples a dataloader initialization with N-6 samples at the start, followed by a dataloader initialization with 6 samples after the first traing epoch.

For the LJSpeech dataset with N = 13.100 the number of validation samples is calculated as 131 and the number of training files as 12.969. This is confirmed by the logs at the training start :

...  
| > Found 13100 files in /home/mbarnig/MB_TTS/TTS/recipes/ljspeech/LJSpeech-1.1
> DataLoader initialization
| > Number of instances : 12969
> TRAINING 
... one epoch ...
 > DataLoader initialization
 | > Number of instances : 131
 > EVALUATION
...

If both validation and training files are specified in the configuration, the numbers of samples in these files are used. For example the bash script TTS/recipes/ljspeech/download_ljspeech.sh splits the metadata.csv into 12.000 samples for training and 1.100 samples for validation. Again when starting the training with this configuration settings, we see in the logs :

| > Found 12000 files in /home/mbarnig/MB_TTS/TTS/recipes/ljspeech/LJSpeech-1.1
> DataLoader initialization
| > Number of instances : 12000
> TRAINING 
... one epoch ...
> DataLoader initialization
| > Number of instances : 1100
> EVALUATION
 ...

For me the dataloading process is now totally clear. Describing the working of the Batch group size is more difficult. I will try to understand it and to explain it with my own words in a next feedback.

1 reply

erogol Dec 20, 2021
Maintainer

Thanks for the explanation. Definitely useful for new folks :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why loading data in two stages at training start ? #991

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Why loading data in two stages at training start ? #991

mbarnig Dec 2, 2021

Replies: 3 comments · 1 reply

erogol Dec 13, 2021 Maintainer

erogol Dec 13, 2021 Maintainer

mbarnig Dec 15, 2021 Author

erogol Dec 20, 2021 Maintainer

mbarnig
Dec 2, 2021

Replies: 3 comments 1 reply

erogol
Dec 13, 2021
Maintainer

erogol
Dec 13, 2021
Maintainer

mbarnig
Dec 15, 2021
Author

erogol Dec 20, 2021
Maintainer