GigaSpeech Series Collection Evolving, Large-Scale, and Multi-domain ASR Corpus • 4 items • Updated 5 days ago
SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing Paper • 2601.09385 • Published 11 days ago
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training Paper • 2601.03065 • Published 18 days ago
SPEAR: A Unified SSL Framework for Learning Speech and Audio Representations Paper • 2510.25955 • Published Oct 29, 2025
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation Paper • 2510.14664 • Published Oct 16, 2025 • 1
Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration Paper • 2509.19928 • Published Sep 24, 2025 • 1
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling Paper • 2506.12570 • Published Jun 14, 2025 • 1
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR Paper • 2409.08797 • Published Sep 13, 2024
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling Paper • 2505.19669 • Published May 26, 2025
VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining Paper • 2505.21527 • Published May 23, 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting Paper • 2504.12867 • Published Apr 17, 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis Paper • 2504.10352 • Published Apr 14, 2025