Samskriti Svara (S-Svara-v1)
Samskriti Svara is a high-fidelity end-to-end Text-to-Speech (TTS) synthesis system developed by Shivam Kothekar. It is designed to provide natural-sounding vocalizations by leveraging advanced neural architectures for waveform reconstruction.
Technical Architecture
Samskriti Svara utilizes a generative architecture characterized by a variational inference approach combined with adversarial training, building upon the structural foundations of the VITS framework and computational logic originally explored in the Massively Multilingual Speech (MMS) project. This ensures a robust, high-performance system capable of detailed phonetic alignment and clear audio synthesis.
Key Components:
- Posterior Encoder: Processes the input phoneme sequences to produce latent representations.
- Stochastic Duration Predictor: Models the inherent rhythm and temporal variance of human speech, allowing for diverse prosody even with identical text inputs.
- Flow-based Decoder: Uses a series of invertible transformations to map latent variables to a mel-spectrogram-like space.
- HiFi-GAN Based Vocoder: Directly generates raw audio waveforms from latent representations, ensuring high-frequency clarity and minimizing artifacts.
Implementation Details
- Architecture Type: Variational Autoencoder (VAE)
- Sampling Rate: 16,000 Hz
- Precision: 32-bit Float
- Input: UTF-8 Encoded Text
- Output: Mono Waveform (WAV)
Usage Instructions
Inference via Transformers
from transformers import VitsModel, AutoTokenizer
import torch
model = VitsModel.from_pretrained("Shivam6566/Samskriti-Svara")
tokenizer = AutoTokenizer.from_pretrained("Shivam6566/Samskriti-Svara")
text = "Samskriti Svara is now synthesizing this sentence."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
output = model(**inputs).waveform
Limitations & Ethics
This model is intended for research and creative applications. Users are encouraged to use the synthesized audio responsibly and avoid generating misleading content. As the weights carry research-oriented origins, this model is released under a Non-Commercial license.
- Downloads last month
- 72