Skip to content

I want to use pre-trained VCTK multi-speaker model as a base for fine-tuned single speaker model #2606

Closed Answered by oinuar
oinuar asked this question in General Q&A
Discussion options

You must be logged in to vote

I ended up using recipes/vctk/yourtts/train_yourtts.py with the following modifications:

  • Using only my custom dataset.
  • Generate speaker embeddings to ../../dataset/speakers.pth for all my voice samples
  • Set text_cleaner to "english_cleaners" since the custom dataset is in English.
  • Set use_weighted_sampler = False.
  • Restore model from ~/.local/share/tts/tts_models--multilingual--multi-dataset--your_tts/model_file.pth
  • Use batch_size = 20 because that's maximum for my hardware.

This will train a Vits model that has only one speaker. You can synthesize voice like this:

tts --text "Hello, this is the voice I generated. Pretty cool, huh?" --model_path=best_model.pth --config_path=config.json --…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@KaikeWesleyReis
Comment options

@oinuar
Comment options

@KaikeWesleyReis
Comment options

@oinuar
Comment options

Answer selected by oinuar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants