RAS1981/qwen3-0.6b-turn-detection-v2

This model is a fine-tuned version of doodod/Turn-Detector-Qwen3-0.6B, which is based on Qwen3-0.6B. It is designed for conversational turn detection—specifically, to determine whether a speaker has finished their turn (Complete) or if they are still speaking/paused (Incomplete/Wait).

Model Details

  • Base Model: doodod/Turn-Detector-Qwen3-0.6B
  • Task: Turn Detection (Binary Classification via Next-Token Prediction)
  • Language: Russian (primary evaluation context), English (inherited capabilities)
  • Architecture: Qwen3-0.6B (Transformer)

Intended Use

This model is intended to be used in real-time conversational systems (like voice bots) to decide when the bot should start speaking. It predicts the probability of the <|im_end|> token (EOS) at the end of a given text segment.

  • Input: User's spoken text (ASR transcript).
  • Output: Probability of turn completion.
  • Threshold: 0.5 (EOS Probability > 0.5 indicates "Turn Finished").

Evaluation Results

The model was evaluated on a dataset of 75 samples categorized into three groups:

  • G1 (FINISHED): Completed sentences/turns (Expected: END).
  • G2 (UNFINISHED): Incomplete sentences/turns (Expected: WAIT).
  • G3 (PAUSE): Pauses or fillers (Expected: WAIT).

Summary Metrics

  • Total Samples: 75
  • Correct Predictions: 45 (60.0%)
  • Failures: 30 (40.0%)
  • Threshold: 0.5
Metric Count Percentage Description
True Negative 22 29.3% Correctly identified incomplete turn (WAIT)
False Positive 30 40.0% Incorrectly identified incomplete turn as finished (Interruption)
False Negative 0 0.0% Incorrectly identified finished turn as incomplete (Latency)
True Positive 23 30.7% Correctly identified finished turn (END)

Performance by Group

Group Total Correct Incorrect Accuracy Precision Recall F1
G1 (Finished) 23 23 0 100.0% 1.00 1.00 1.00
G2 (Unfinished) 42 21 21 50.0% 0.00 0.00 0.00
G3 (Pause) 10 1 9 10.0% 0.00 0.00 0.00

Note: The model has perfect recall for finished turns (it never misses an end), but it suffers from a high False Positive rate on unfinished turns (G2) and pauses (G3), meaning it is "trigger-happy" and prone to interrupting users.

Analysis of Failures

The model primarily fails by predicting "Complete" (EOS > 0.5) when the turn is actually incomplete. This suggests the model relies heavily on syntactic closure or pauses rather than semantic incompleteness.

Common Failure Patterns

  1. Trailing Conjunctions/Prepositions:

    • Text: "...для начала очень важно, чтобы там находилось это." (EOS: 0.92) - Model ignores the context suggesting more is coming.
    • Text: "...чтобы вы понимали, мне надо там." (EOS: 0.99) - Ends abruptly but model sees it as complete.
  2. Semantic Continuation:

    • Text: "Давайте сперва начнем с того, что когда ремонт закончится..." (EOS: 0.97) - Structurally looks like a clause, but semantically incomplete.
    • Text: "Вы меня про условия работы не говорили. Вы бесплатно можете нам помочь? Или у вас какие-то другие условия? Для начала мне надо понимать, что у вас..." (EOS: 0.80) - Clearly cut off, but high EOS probability.
  3. Fillers interpreted as Ends:

    • Text: "Ну, обычно." (EOS: 0.95) - Short phrases are often misclassified as complete turns.

Recommendations for V3

To improve performance, future fine-tuning should focus on:

  1. Hard Negatives: Include more training examples of incomplete sentences that look complete (e.g., end in high-frequency words but are syntactically cut off).
  2. Pause/Filler Handling: Specifically train on "G3" type data (fillers, hesitations) labeled as incomplete.
  3. Context Window: Ensure the model considers the entire context, not just the last few tokens.

How to Use (Inference)

from unsloth import FastLanguageModel
import torch

model_name = "RAS1981/qwen3-0.6b-turn-detection-v2"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    load_in_4bit=True,
    max_seq_length=2048,
)
EOS_ID = 151645 # <|im_end|>

def get_turn_probability(text):
    messages = [
        {"role": "system", "content": "Ты определяешь конец реплики пользователя по смыслу."},
        {"role": "user", "content": text}
    ]
    # Important: Disable thinking and strip trailing EOS for prediction
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False, enable_thinking=False)
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
    
    # Strip auto-added EOS if present (depends on tokenizer behavior)
    if inputs.input_ids[0][-1] == EOS_ID:
        inputs.input_ids = inputs.input_ids[:, :-1]
        
    with torch.no_grad():
        logits = model(**inputs).logits[:, -1, :]
        probs = torch.softmax(logits, dim=-1)
        eos_prob = probs[0, EOS_ID].item()
        
    return eos_prob

text = "Алло, здравствуйте"
print(f"Turn Probability: {get_turn_probability(text):.4f}")
Downloads last month
29
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RAS1981/qwen3-0.6b-turn-detection-v2

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(2)
this model