Add pseudo-labeling based semi-supervised training recipe

k2-fsa · Aug 16, 2024 · 60bfbff · 60bfbff
1 parent 1730fce
commit 60bfbff
Show file tree

Hide file tree

Showing 17 changed files with 6,979 additions and 0 deletions.
diff --git a/egs/librispeech/PL/README.md b/egs/librispeech/PL/README.md
@@ -0,0 +1,118 @@
+# Introduction
+
+This is a pseudo-labeling based semi-supervised ASR recipe for the LibriSpeech dataset. The ASR model is Zipformer Transducer. The labeled data is Labeled data is LibriSpeech train-clean-100. Unlabeled data can be LibriSpeech "train-clean-360 + train-other-500" for conventional semi-supervised learning or TedLium3 training set for unsupervised domain adaptation. 
+
+## Description of the recipe
+
+### Preparation of data
+
+The data required in this recipe is the same with LibriSpeech/TedLium3 ASR recipe. And the tokenizer of LibriSpeech is used to build the model. Therefore, we can reuse the `prepare.sh` scripts in those recipes.
+
+### Supervised training for the seed ASR model
+
+Firstly, we need to perform supervised training on the LibriSpeech train-clean-100 subset to generate the seed model for the following pseudo-labeling based semi-supervsed training.
+
+```
+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+./zipformer/train_seed.py \
+  --world-size 4 \
+  --num-epochs 70 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp_seed \
+  --max-duration 1000
+```
+
+For better performance of the seed model, we average the checkpoints as follows:
+
+```
+./zipformer/generate_averaged_model.py \
+    --epoch 70 \
+    --avg 30 \
+    --exp-dir ./zipformer/exp_seed
+```
+
+The above command generates the final seed model `./zipformer/exp_seed/epoch-70-avg-30.pt`
+
+### Semi-supervised training for the final ASR model
+
+Then, we peform semi-supervised training with the seed model as the initialization. 
+
+- Conventional semi-supervised learning setting where unlabeled data is "train-clean-360 + train-other-500":
+
+```
+./zipformer/train_pl.py \
+  --world-size 4 \
+  --num-epochs 20 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp_pl_librispeech \
+  --max-duration 1000 \
+  --seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \
+  --unlabeled-dataset "librispeech"
+```
+
+- Unsupervised domain adaptation setting where unlabeled data is TedLium3 training set:
+
+```
+./zipformer/train_pl.py \
+  --world-size 4 \
+  --num-epochs 20 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp_pl_tedlium \
+  --max-duration 1000 \
+  --seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \
+  --unlabeled-dataset "tedlium"
+```
+
+### Decode
+
+Finally, we decode the ASR model to evaluate the performance.
+
+- Evaluate on the LibriSpeech dataset:
+
+```
+./zipformer/decode.py \
+    --epoch 20 \
+    --avg 10 \
+    --exp-dir ./zipformer/exp_pl_librispeech \
+    --max-duration 600 \
+    --decoding-method modified_beam_search \
+    --beam-size 4 \
+    --dataset "librispeech"
+```
+
+- Evaluate on the TedLium3 dataset:
+
+```
+./zipformer/decode.py \
+    --epoch 20 \
+    --avg 10 \
+    --exp-dir ./zipformer/exp_pl_tedlium \
+    --max-duration 600 \
+    --decoding-method modified_beam_search \
+    --beam-size 4 \
+    --dataset "tedlium"
+```
+
+## Results
+
+- Conventional semi-supervised learning (LibriSpeech 100h/LibriSpeech 860h)
+
+| Model         | test-clean | test-other | comment             |
+|-------------------------|------------|------------|---------------------|
+| supervised seed model | 5.45       | 13.7      |  --epoch 70 --avg 30 |
+| pseudo-labeling model | 4.33       | 9.61      | --epoch 20 --avg 10  |
+
+- Unsupervised domain adaptation (LibriSpeech 100h/TedLium3)
+
+| Model         | tedlium3 dev | tedlium3 test | comment             |
+|-------------------------|------------|------------|---------------------|
+| supervised seed model | 18.29      | 18.16      |  --epoch 70 --avg 30 |
+| pseudo-labeling model | 14.97       | 14.65      | --epoch 20 --avg 10  |
+
+
+## Pre-trained models and logs
+
+You can find the pre-trained models, training logs, tensorboard logs, decoding logs and decoding results at <https://huggingface.co/zhu-han/icefall-pl-librispeech-zipformer-medium-2023-08-06>