RAS1981/qwen3-0.6b-turn-detection-v2

This model is a fine-tuned version of doodod/Turn-Detector-Qwen3-0.6B, which is based on Qwen3-0.6B. It is designed for conversational turn detection—specifically, to determine whether a speaker has finished their turn (Complete) or if they are still speaking/paused (Incomplete/Wait).

Model Details

Base Model: doodod/Turn-Detector-Qwen3-0.6B
Task: Turn Detection (Binary Classification via Next-Token Prediction)
Language: Russian (primary evaluation context), English (inherited capabilities)
Architecture: Qwen3-0.6B (Transformer)

Intended Use

This model is intended to be used in real-time conversational systems (like voice bots) to decide when the bot should start speaking. It predicts the probability of the <|im_end|> token (EOS) at the end of a given text segment.

Input: User's spoken text (ASR transcript).
Output: Probability of turn completion.
Threshold: 0.5 (EOS Probability > 0.5 indicates "Turn Finished").

Evaluation Results

The model was evaluated on a dataset of 75 samples categorized into three groups:

G1 (FINISHED): Completed sentences/turns (Expected: END).
G2 (UNFINISHED): Incomplete sentences/turns (Expected: WAIT).
G3 (PAUSE): Pauses or fillers (Expected: WAIT).

Summary Metrics

Total Samples: 75
Correct Predictions: 45 (60.0%)
Failures: 30 (40.0%)
Threshold: 0.5

Metric	Count	Percentage	Description
True Negative	22	29.3%	Correctly identified incomplete turn (WAIT)
False Positive	30	40.0%	Incorrectly identified incomplete turn as finished (Interruption)
False Negative	0	0.0%	Incorrectly identified finished turn as incomplete (Latency)
True Positive	23	30.7%	Correctly identified finished turn (END)

Performance by Group

Group	Total	Correct	Incorrect	Accuracy	Precision	Recall	F1
G1 (Finished)	23	23	0	100.0%	1.00	1.00	1.00
G2 (Unfinished)	42	21	21	50.0%	0.00	0.00	0.00
G3 (Pause)	10	1	9	10.0%	0.00	0.00	0.00

Note: The model has perfect recall for finished turns (it never misses an end), but it suffers from a high False Positive rate on unfinished turns (G2) and pauses (G3), meaning it is "trigger-happy" and prone to interrupting users.

Analysis of Failures

The model primarily fails by predicting "Complete" (EOS > 0.5) when the turn is actually incomplete. This suggests the model relies heavily on syntactic closure or pauses rather than semantic incompleteness.

Common Failure Patterns

Trailing Conjunctions/Prepositions:
- Text: "...для начала очень важно, чтобы там находилось это." (EOS: 0.92) - Model ignores the context suggesting more is coming.
- Text: "...чтобы вы понимали, мне надо там." (EOS: 0.99) - Ends abruptly but model sees it as complete.
Semantic Continuation:
- Text: "Давайте сперва начнем с того, что когда ремонт закончится..." (EOS: 0.97) - Structurally looks like a clause, but semantically incomplete.
- Text: "Вы меня про условия работы не говорили. Вы бесплатно можете нам помочь? Или у вас какие-то другие условия? Для начала мне надо понимать, что у вас..." (EOS: 0.80) - Clearly cut off, but high EOS probability.
Fillers interpreted as Ends:
- Text: "Ну, обычно." (EOS: 0.95) - Short phrases are often misclassified as complete turns.

Recommendations for V3

To improve performance, future fine-tuning should focus on:

Hard Negatives: Include more training examples of incomplete sentences that look complete (e.g., end in high-frequency words but are syntactically cut off).
Pause/Filler Handling: Specifically train on "G3" type data (fillers, hesitations) labeled as incomplete.
Context Window: Ensure the model considers the entire context, not just the last few tokens.

How to Use (Inference)

from unsloth import FastLanguageModel
import torch

model_name = "RAS1981/qwen3-0.6b-turn-detection-v2"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    load_in_4bit=True,
    max_seq_length=2048,
)
EOS_ID = 151645 # <|im_end|>

def get_turn_probability(text):
    messages = [
        {"role": "system", "content": "Ты определяешь конец реплики пользователя по смыслу."},
        {"role": "user", "content": text}
    ]
    # Important: Disable thinking and strip trailing EOS for prediction
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False, enable_thinking=False)
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
    
    # Strip auto-added EOS if present (depends on tokenizer behavior)
    if inputs.input_ids[0][-1] == EOS_ID:
        inputs.input_ids = inputs.input_ids[:, :-1]
        
    with torch.no_grad():
        logits = model(**inputs).logits[:, -1, :]
        probs = torch.softmax(logits, dim=-1)
        eos_prob = probs[0, EOS_ID].item()
        
    return eos_prob

text = "Алло, здравствуйте"
print(f"Turn Probability: {get_turn_probability(text):.4f}")

Downloads last month: 29

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for RAS1981/qwen3-0.6b-turn-detection-v2

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

doodod/Turn-Detector-Qwen3-0.6B

Finetuned

(2)

this model