-
Notifications
You must be signed in to change notification settings - Fork 310
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add pseudo-labeling based semi-supervised training recipe
- Loading branch information
Showing
17 changed files
with
6,979 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
# Introduction | ||
|
||
This is a pseudo-labeling based semi-supervised ASR recipe for the LibriSpeech dataset. The ASR model is Zipformer Transducer. The labeled data is Labeled data is LibriSpeech train-clean-100. Unlabeled data can be LibriSpeech "train-clean-360 + train-other-500" for conventional semi-supervised learning or TedLium3 training set for unsupervised domain adaptation. | ||
|
||
## Description of the recipe | ||
|
||
### Preparation of data | ||
|
||
The data required in this recipe is the same with LibriSpeech/TedLium3 ASR recipe. And the tokenizer of LibriSpeech is used to build the model. Therefore, we can reuse the `prepare.sh` scripts in those recipes. | ||
|
||
### Supervised training for the seed ASR model | ||
|
||
Firstly, we need to perform supervised training on the LibriSpeech train-clean-100 subset to generate the seed model for the following pseudo-labeling based semi-supervsed training. | ||
|
||
``` | ||
export CUDA_VISIBLE_DEVICES="0,1,2,3" | ||
./zipformer/train_seed.py \ | ||
--world-size 4 \ | ||
--num-epochs 70 \ | ||
--start-epoch 1 \ | ||
--use-fp16 1 \ | ||
--exp-dir zipformer/exp_seed \ | ||
--max-duration 1000 | ||
``` | ||
|
||
For better performance of the seed model, we average the checkpoints as follows: | ||
|
||
``` | ||
./zipformer/generate_averaged_model.py \ | ||
--epoch 70 \ | ||
--avg 30 \ | ||
--exp-dir ./zipformer/exp_seed | ||
``` | ||
|
||
The above command generates the final seed model `./zipformer/exp_seed/epoch-70-avg-30.pt` | ||
|
||
### Semi-supervised training for the final ASR model | ||
|
||
Then, we peform semi-supervised training with the seed model as the initialization. | ||
|
||
- Conventional semi-supervised learning setting where unlabeled data is "train-clean-360 + train-other-500": | ||
|
||
``` | ||
./zipformer/train_pl.py \ | ||
--world-size 4 \ | ||
--num-epochs 20 \ | ||
--start-epoch 1 \ | ||
--use-fp16 1 \ | ||
--exp-dir zipformer/exp_pl_librispeech \ | ||
--max-duration 1000 \ | ||
--seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \ | ||
--unlabeled-dataset "librispeech" | ||
``` | ||
|
||
- Unsupervised domain adaptation setting where unlabeled data is TedLium3 training set: | ||
|
||
``` | ||
./zipformer/train_pl.py \ | ||
--world-size 4 \ | ||
--num-epochs 20 \ | ||
--start-epoch 1 \ | ||
--use-fp16 1 \ | ||
--exp-dir zipformer/exp_pl_tedlium \ | ||
--max-duration 1000 \ | ||
--seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \ | ||
--unlabeled-dataset "tedlium" | ||
``` | ||
|
||
### Decode | ||
|
||
Finally, we decode the ASR model to evaluate the performance. | ||
|
||
- Evaluate on the LibriSpeech dataset: | ||
|
||
``` | ||
./zipformer/decode.py \ | ||
--epoch 20 \ | ||
--avg 10 \ | ||
--exp-dir ./zipformer/exp_pl_librispeech \ | ||
--max-duration 600 \ | ||
--decoding-method modified_beam_search \ | ||
--beam-size 4 \ | ||
--dataset "librispeech" | ||
``` | ||
|
||
- Evaluate on the TedLium3 dataset: | ||
|
||
``` | ||
./zipformer/decode.py \ | ||
--epoch 20 \ | ||
--avg 10 \ | ||
--exp-dir ./zipformer/exp_pl_tedlium \ | ||
--max-duration 600 \ | ||
--decoding-method modified_beam_search \ | ||
--beam-size 4 \ | ||
--dataset "tedlium" | ||
``` | ||
|
||
## Results | ||
|
||
- Conventional semi-supervised learning (LibriSpeech 100h/LibriSpeech 860h) | ||
|
||
| Model | test-clean | test-other | comment | | ||
|-------------------------|------------|------------|---------------------| | ||
| supervised seed model | 5.45 | 13.7 | --epoch 70 --avg 30 | | ||
| pseudo-labeling model | 4.33 | 9.61 | --epoch 20 --avg 10 | | ||
|
||
- Unsupervised domain adaptation (LibriSpeech 100h/TedLium3) | ||
|
||
| Model | tedlium3 dev | tedlium3 test | comment | | ||
|-------------------------|------------|------------|---------------------| | ||
| supervised seed model | 18.29 | 18.16 | --epoch 70 --avg 30 | | ||
| pseudo-labeling model | 14.97 | 14.65 | --epoch 20 --avg 10 | | ||
|
||
|
||
## Pre-trained models and logs | ||
|
||
You can find the pre-trained models, training logs, tensorboard logs, decoding logs and decoding results at <https://huggingface.co/zhu-han/icefall-pl-librispeech-zipformer-medium-2023-08-06> |
Oops, something went wrong.