LiveKit Turn Detector - Arabic

Fine-tuned Arabic End-of-Utterance (EOU) detection model for LiveKit voice agents.

Model Details

  • Base Model: livekit/turn-detector (Qwen2-0.5B)
  • Fine-tuning Method: LoRA (rank=32, alpha=64)
  • Dataset: 57,475 Arabic EOU samples
  • Languages: Arabic (ar, ar-SA, ar-EG, Gulf dialects)
  • Training: 3 epochs on T4 GPU (~20-30 minutes)
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Use Cases

  • Real-time Arabic voice agents with LiveKit
  • Turn-taking detection in Arabic conversations
  • End-of-utterance detection for Gulf Arabic dialects
  • Multilingual voice assistants with Arabic support

Usage

Basic Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained("Moustafa3092/livekit-turn-detector-arabic")
tokenizer = AutoTokenizer.from_pretrained("Moustafa3092/livekit-turn-detector-arabic", trust_remote_code=True)

# Predict EOU
def predict_eou(text: str) -> float:
    formatted = f"<|im_start|>user\n{text}"
    inputs = tokenizer(formatted, return_tensors="pt", add_special_tokens=False)

    with torch.no_grad():
        logits = model(**inputs).logits[0, -1, :]
        probs = torch.softmax(logits, dim=-1)
        eou_prob = probs[tokenizer.convert_tokens_to_ids("<|im_end|>")]

    return eou_prob.item()

# Test
print(predict_eou("ุดูƒุฑุง ุฌุฒูŠู„ุง"))  # Should be high (complete)
print(predict_eou("ุงู…ู…ู…ู…ู…"))      # Should be low (incomplete)

With LiveKit Voice Agents

  1. Export to ONNX:

    # Convert model to ONNX format for production
    # See deployment documentation
    
  2. Use in LiveKit Agent:

    from livekit.agents import WorkerOptions, cli
    from livekit.plugins import turn_detector
    
    # Configure with your fine-tuned model
    turn_detector.configure(model_path="path/to/model.onnx")
    

Training Data

Dataset Composition (57,475 samples)

Complete Utterances (EOU): 20,194 (35.1%)

  • CSV data: 19,432 samples
  • Edge case closures: 762 samples
  • Examples: "ุดูƒุฑุง" (thank you), "ุชู…ุงู…" (perfect), "ู…ุน ุงู„ุณู„ุงู…ุฉ" (goodbye)

Incomplete Utterances (non-EOU): 37,281 (64.9%)

  • Generated variants: 36,610 samples
  • Edge case hesitations: 671 samples
  • Examples: "ุงู…ู…ู…ู…ู…" (ummm...), "ูŠุนู†ูŠ" (you know...), "ุจุณ" (but...)

Edge Cases (1,433 samples)

Hesitations (Non-EOU) - Speaker thinking, will continue:

  • ุงู…ู…ู…ู…ู…, ูŠุนู†ูŠ, ุฎู„ุงุต ุจุณ, ุทูŠุจ ูˆ

Closures (EOU) - Complete short responses:

  • ุดูƒุฑุง, ุชู…ุงู…, ู†ุนู…, ู„ุง, ู…ุน ุงู„ุณู„ุงู…ุฉ

Performance

  • Accuracy: >90% on Arabic edge cases
  • Training Time: ~20-30 minutes (T4 GPU)
  • Memory: ~6 GB VRAM during training
  • Inference: Real-time compatible
  • Model Size: 494 MB (full model)

Model Architecture

  • Architecture: Qwen2-0.5B (Causal LM)
  • Parameters: ~500M total
  • Fine-tuned: LoRA adapters merged
  • Context Length: Supports conversation context
  • Output: EOU probability via <|im_end|> token

Limitations

  • Optimized for Modern Standard Arabic and Gulf dialects
  • May need additional fine-tuning for other Arabic dialects
  • Requires sufficient context for accurate predictions
  • Best performance on conversational Arabic

Citation

If you use this model, please cite:

@misc{livekit-turn-detector-arabic,
  author = {Moustafa3092},
  title = {LiveKit Turn Detector - Arabic},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Moustafa3092/livekit-turn-detector-arabic}
}

License

Apache 2.0

Acknowledgements


Developed by: Moustafa3092 Model Type: Turn Detection / End-of-Utterance Language: Arabic (ar) Base Model: livekit/turn-detector (v0.4.1-intl)

Downloads last month
86
Safetensors
Model size
0.3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Moustafa3092/livekit-turn-detector-arabic

Finetuned
(2)
this model