Samskriti Svara (S-Svara-v1)

Samskriti Svara is a high-fidelity end-to-end Text-to-Speech (TTS) synthesis system developed by Shivam Kothekar. It is designed to provide natural-sounding vocalizations by leveraging advanced neural architectures for waveform reconstruction.

Technical Architecture

Samskriti Svara utilizes a generative architecture characterized by a variational inference approach combined with adversarial training, building upon the structural foundations of the VITS framework and computational logic originally explored in the Massively Multilingual Speech (MMS) project. This ensures a robust, high-performance system capable of detailed phonetic alignment and clear audio synthesis.

Key Components:

  1. Posterior Encoder: Processes the input phoneme sequences to produce latent representations.
  2. Stochastic Duration Predictor: Models the inherent rhythm and temporal variance of human speech, allowing for diverse prosody even with identical text inputs.
  3. Flow-based Decoder: Uses a series of invertible transformations to map latent variables to a mel-spectrogram-like space.
  4. HiFi-GAN Based Vocoder: Directly generates raw audio waveforms from latent representations, ensuring high-frequency clarity and minimizing artifacts.

Implementation Details

  • Architecture Type: Variational Autoencoder (VAE)
  • Sampling Rate: 16,000 Hz
  • Precision: 32-bit Float
  • Input: UTF-8 Encoded Text
  • Output: Mono Waveform (WAV)

Usage Instructions

Inference via Transformers

from transformers import VitsModel, AutoTokenizer
import torch

model = VitsModel.from_pretrained("Shivam6566/Samskriti-Svara")
tokenizer = AutoTokenizer.from_pretrained("Shivam6566/Samskriti-Svara")

text = "Samskriti Svara is now synthesizing this sentence."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform

Limitations & Ethics

This model is intended for research and creative applications. Users are encouraged to use the synthesized audio responsibly and avoid generating misleading content. As the weights carry research-oriented origins, this model is released under a Non-Commercial license.

Downloads last month
72
Safetensors
Model size
36.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support