You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

VITS TTS for Indian Languages

This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more.


Model Overview

The model ai4bharat/vits_rasa_13 is based on the VITS architecture and supports the following features:

  • Languages: Multiple Indian languages.
  • Styles: Various speaking styles and emotions.
  • Speaker IDs: Predefined speaker profiles for male and female voices.

Installation

pip install transformers torch

Usage

Here's a quick example to get started:

import soundfile as sf
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)

text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?"  # Example text in Punjabi
speaker_id = 16  # PAN_M
style_id = 0  # ALEXA

inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)

Supported Languages

  • Assamese
  • Bengali
  • Bodo
  • Dogri
  • Kannada
  • Maithili
  • Malayalam
  • Marathi
  • Nepali
  • Punjabi
  • Sanskrit
  • Tamil
  • Telugu

Speaker-Style Identifier Overview

Speaker Name Speaker ID
ASM_F 0
ASM_M 1
BEN_F 2
BEN_M 3
BRX_F 4
BRX_M 5
DOI_F 6
DOI_M 7
KAN_F 8
KAN_M 9
MAI_M 10
MAL_F 11
MAR_F 12
MAR_M 13
NEP_F 14
PAN_F 15
PAN_M 16
SAN_M 17
TAM_F 18
TEL_F 19
Style Name Style ID
ALEXA 0
ANGER 1
BB 2
BOOK 3
CONV 4
DIGI 5
DISGUST 6
FEAR 7
HAPPY 8
NEWS 10
SAD 12
SURPRISE 14
UMANG 15
WIKI 16

Citation

If you use this model in your research, please cite:

@article{ai4bharat_vits_rasa_13,
  title={VITS TTS for Indian Languages},
  author={Ashwin Sankar},
  year={2024},
  publisher={Hugging Face}
}
Downloads last month
458
Safetensors
Model size
40.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using ai4bharat/vits_rasa_13 2