Skip to content

Pytorch implementation of Tacotron2, modern text-to-speech model

Notifications You must be signed in to change notification settings

tabisheva/talking-machines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tacotron2

Pytorch implementation of Tacotron2, a modern text-to-speech model based on this paper

Usage

To convert mel spectrograms to audio we need Nvidia's pretrained Vocoder

! git clone https://github.com/NVIDIA/waveglow.git

! pip install googledrivedownloader

from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(
    file_id='1rpK8CzAAirq9sWZhe9nlfvxMF1dRgFbF',
    dest_path='./waveglow_256channels_universal_v5.pt'
)

Then run ./run_docker.sh with correct volume option

Training

Download LJSpeech dataset

Set preferred settings in config.py, then run python train.py

In wandb.ai will be logged:

  • Train and validation losses
  • Original text
  • Predicted and ground truth mel spectrograms
  • Predicted and ground truth audio
  • Probabilties of the last frame over the audio

Inference

python inference.py "Your text for speech synthesis"

The result will be logged in wandb.ai.

You can use my pretrained model:

gdd.download_file_from_google_drive(
    file_id='1gjOSUTyuFsdVOpPcLaEZjGHpgBEs_lTZ',
    dest_path='./tacotron.ptt'
)

About

Pytorch implementation of Tacotron2, modern text-to-speech model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published