Configuration Parsing Warning: Invalid JSON for config file config.json

VITS Sinhala TTS

A VITS text-to-speech model for Sinhala (සිංහල), trained using Coqui TTS.

GitHub: pradeep-sanjaya/sinhala-tts

Training Details

Detail Value
Model VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech)
Language Sinhala (සිංහල)
Epochs 300
Final mel loss ~18.92
Dataset Multi-speaker TTS Dataset Sinhala
GPU NVIDIA A100-80GB (via Modal)
Training time ~3.2 hours
Framework Coqui TTS 0.27.5

Usage

From Hugging Face

from huggingface_hub import hf_hub_download
from TTS.utils.synthesizer import Synthesizer

config_path = hf_hub_download(repo_id="ngpsanjaya/vits-sinhala", filename="config.json")
model_path = hf_hub_download(repo_id="ngpsanjaya/vits-sinhala", filename="model.pth")

synthesizer = Synthesizer(
    tts_checkpoint=model_path,
    tts_config_path=config_path,
    use_cuda=True,
)

wav = synthesizer.tts("ආයුබෝවන්")

Save to WAV

import numpy as np
import soundfile as sf

sf.write("output.wav", np.array(wav), synthesizer.tts_config.audio.sample_rate)

Training & Deployment

The full training pipeline supports Modal, Kaggle, Google Colab, and AWS SageMaker.

See the GitHub repo for:

  • Platform-specific configs and training scripts
  • Kaggle and Colab notebooks for free GPU training
  • Inference scripts (Modal and local)
  • Checkpoint resume support

License

MIT. Please check the dataset license for data usage terms.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ngpsanjaya/sinhala-tts