Configuration Parsing Warning: Invalid JSON for config file config.json

VITS Sinhala TTS

A VITS text-to-speech model for Sinhala (සිංහල), trained using Coqui TTS.

Training Details

Detail	Value
Model	VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
Language	Sinhala (සිංහල)
Epochs	300
Final mel loss	~18.92
Dataset	Multi-speaker TTS Dataset Sinhala
GPU	NVIDIA A100-80GB (via Modal)
Training time	~3.2 hours
Framework	Coqui TTS 0.27.5

Usage

From Hugging Face

from huggingface_hub import hf_hub_download
from TTS.utils.synthesizer import Synthesizer

config_path = hf_hub_download(repo_id="ngpsanjaya/vits-sinhala", filename="config.json")
model_path = hf_hub_download(repo_id="ngpsanjaya/vits-sinhala", filename="model.pth")

synthesizer = Synthesizer(
    tts_checkpoint=model_path,
    tts_config_path=config_path,
    use_cuda=True,
)

wav = synthesizer.tts("ආයුබෝවන්")

Save to WAV

import numpy as np
import soundfile as sf

sf.write("output.wav", np.array(wav), synthesizer.tts_config.audio.sample_rate)

Training & Deployment

The full training pipeline supports Modal, Kaggle, Google Colab, and AWS SageMaker.

See the GitHub repo for:

Platform-specific configs and training scripts
Kaggle and Colab notebooks for free GPU training
Inference scripts (Modal and local)
Checkpoint resume support

License

MIT. Please check the dataset license for data usage terms.

Downloads last month: 1

Paper for ngpsanjaya/sinhala-tts

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Paper • 2106.06103 • Published Jun 11, 2021 • 4