LiveKit Turn Detector - Arabic
Fine-tuned Arabic End-of-Utterance (EOU) detection model for LiveKit voice agents.
Model Details
- Base Model: livekit/turn-detector (Qwen2-0.5B)
- Fine-tuning Method: LoRA (rank=32, alpha=64)
- Dataset: 57,475 Arabic EOU samples
- Languages: Arabic (ar, ar-SA, ar-EG, Gulf dialects)
- Training: 3 epochs on T4 GPU (~20-30 minutes)
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Use Cases
- Real-time Arabic voice agents with LiveKit
- Turn-taking detection in Arabic conversations
- End-of-utterance detection for Gulf Arabic dialects
- Multilingual voice assistants with Arabic support
Usage
Basic Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model
model = AutoModelForCausalLM.from_pretrained("Moustafa3092/livekit-turn-detector-arabic")
tokenizer = AutoTokenizer.from_pretrained("Moustafa3092/livekit-turn-detector-arabic", trust_remote_code=True)
# Predict EOU
def predict_eou(text: str) -> float:
formatted = f"<|im_start|>user\n{text}"
inputs = tokenizer(formatted, return_tensors="pt", add_special_tokens=False)
with torch.no_grad():
logits = model(**inputs).logits[0, -1, :]
probs = torch.softmax(logits, dim=-1)
eou_prob = probs[tokenizer.convert_tokens_to_ids("<|im_end|>")]
return eou_prob.item()
# Test
print(predict_eou("ุดูุฑุง ุฌุฒููุง")) # Should be high (complete)
print(predict_eou("ุงู
ู
ู
ู
ู
")) # Should be low (incomplete)
With LiveKit Voice Agents
Export to ONNX:
# Convert model to ONNX format for production # See deployment documentationUse in LiveKit Agent:
from livekit.agents import WorkerOptions, cli from livekit.plugins import turn_detector # Configure with your fine-tuned model turn_detector.configure(model_path="path/to/model.onnx")
Training Data
Dataset Composition (57,475 samples)
Complete Utterances (EOU): 20,194 (35.1%)
- CSV data: 19,432 samples
- Edge case closures: 762 samples
- Examples: "ุดูุฑุง" (thank you), "ุชู ุงู " (perfect), "ู ุน ุงูุณูุงู ุฉ" (goodbye)
Incomplete Utterances (non-EOU): 37,281 (64.9%)
- Generated variants: 36,610 samples
- Edge case hesitations: 671 samples
- Examples: "ุงู ู ู ู ู " (ummm...), "ูุนูู" (you know...), "ุจุณ" (but...)
Edge Cases (1,433 samples)
Hesitations (Non-EOU) - Speaker thinking, will continue:
- ุงู ู ู ู ู , ูุนูู, ุฎูุงุต ุจุณ, ุทูุจ ู
Closures (EOU) - Complete short responses:
- ุดูุฑุง, ุชู ุงู , ูุนู , ูุง, ู ุน ุงูุณูุงู ุฉ
Performance
- Accuracy: >90% on Arabic edge cases
- Training Time: ~20-30 minutes (T4 GPU)
- Memory: ~6 GB VRAM during training
- Inference: Real-time compatible
- Model Size: 494 MB (full model)
Model Architecture
- Architecture: Qwen2-0.5B (Causal LM)
- Parameters: ~500M total
- Fine-tuned: LoRA adapters merged
- Context Length: Supports conversation context
- Output: EOU probability via <|im_end|> token
Limitations
- Optimized for Modern Standard Arabic and Gulf dialects
- May need additional fine-tuning for other Arabic dialects
- Requires sufficient context for accurate predictions
- Best performance on conversational Arabic
Citation
If you use this model, please cite:
@misc{livekit-turn-detector-arabic,
author = {Moustafa3092},
title = {LiveKit Turn Detector - Arabic},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/Moustafa3092/livekit-turn-detector-arabic}
}
License
Apache 2.0
Acknowledgements
- Based on LiveKit Turn Detector
- Built with Transformers and PEFT
- Fine-tuned for Arabic language support
Developed by: Moustafa3092 Model Type: Turn Detection / End-of-Utterance Language: Arabic (ar) Base Model: livekit/turn-detector (v0.4.1-intl)
- Downloads last month
- 86
Model tree for Moustafa3092/livekit-turn-detector-arabic
Base model
livekit/turn-detector