RAS1981/qwen3-0.6b-turn-detection-v2
This model is a fine-tuned version of doodod/Turn-Detector-Qwen3-0.6B, which is based on Qwen3-0.6B. It is designed for conversational turn detection—specifically, to determine whether a speaker has finished their turn (Complete) or if they are still speaking/paused (Incomplete/Wait).
Model Details
- Base Model: doodod/Turn-Detector-Qwen3-0.6B
- Task: Turn Detection (Binary Classification via Next-Token Prediction)
- Language: Russian (primary evaluation context), English (inherited capabilities)
- Architecture: Qwen3-0.6B (Transformer)
Intended Use
This model is intended to be used in real-time conversational systems (like voice bots) to decide when the bot should start speaking. It predicts the probability of the <|im_end|> token (EOS) at the end of a given text segment.
- Input: User's spoken text (ASR transcript).
- Output: Probability of turn completion.
- Threshold: 0.5 (EOS Probability > 0.5 indicates "Turn Finished").
Evaluation Results
The model was evaluated on a dataset of 75 samples categorized into three groups:
- G1 (FINISHED): Completed sentences/turns (Expected: END).
- G2 (UNFINISHED): Incomplete sentences/turns (Expected: WAIT).
- G3 (PAUSE): Pauses or fillers (Expected: WAIT).
Summary Metrics
- Total Samples: 75
- Correct Predictions: 45 (60.0%)
- Failures: 30 (40.0%)
- Threshold: 0.5
| Metric | Count | Percentage | Description |
|---|---|---|---|
| True Negative | 22 | 29.3% | Correctly identified incomplete turn (WAIT) |
| False Positive | 30 | 40.0% | Incorrectly identified incomplete turn as finished (Interruption) |
| False Negative | 0 | 0.0% | Incorrectly identified finished turn as incomplete (Latency) |
| True Positive | 23 | 30.7% | Correctly identified finished turn (END) |
Performance by Group
| Group | Total | Correct | Incorrect | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| G1 (Finished) | 23 | 23 | 0 | 100.0% | 1.00 | 1.00 | 1.00 |
| G2 (Unfinished) | 42 | 21 | 21 | 50.0% | 0.00 | 0.00 | 0.00 |
| G3 (Pause) | 10 | 1 | 9 | 10.0% | 0.00 | 0.00 | 0.00 |
Note: The model has perfect recall for finished turns (it never misses an end), but it suffers from a high False Positive rate on unfinished turns (G2) and pauses (G3), meaning it is "trigger-happy" and prone to interrupting users.
Analysis of Failures
The model primarily fails by predicting "Complete" (EOS > 0.5) when the turn is actually incomplete. This suggests the model relies heavily on syntactic closure or pauses rather than semantic incompleteness.
Common Failure Patterns
Trailing Conjunctions/Prepositions:
- Text: "...для начала очень важно, чтобы там находилось это." (EOS: 0.92) - Model ignores the context suggesting more is coming.
- Text: "...чтобы вы понимали, мне надо там." (EOS: 0.99) - Ends abruptly but model sees it as complete.
Semantic Continuation:
- Text: "Давайте сперва начнем с того, что когда ремонт закончится..." (EOS: 0.97) - Structurally looks like a clause, but semantically incomplete.
- Text: "Вы меня про условия работы не говорили. Вы бесплатно можете нам помочь? Или у вас какие-то другие условия? Для начала мне надо понимать, что у вас..." (EOS: 0.80) - Clearly cut off, but high EOS probability.
Fillers interpreted as Ends:
- Text: "Ну, обычно." (EOS: 0.95) - Short phrases are often misclassified as complete turns.
Recommendations for V3
To improve performance, future fine-tuning should focus on:
- Hard Negatives: Include more training examples of incomplete sentences that look complete (e.g., end in high-frequency words but are syntactically cut off).
- Pause/Filler Handling: Specifically train on "G3" type data (fillers, hesitations) labeled as incomplete.
- Context Window: Ensure the model considers the entire context, not just the last few tokens.
How to Use (Inference)
from unsloth import FastLanguageModel
import torch
model_name = "RAS1981/qwen3-0.6b-turn-detection-v2"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
load_in_4bit=True,
max_seq_length=2048,
)
EOS_ID = 151645 # <|im_end|>
def get_turn_probability(text):
messages = [
{"role": "system", "content": "Ты определяешь конец реплики пользователя по смыслу."},
{"role": "user", "content": text}
]
# Important: Disable thinking and strip trailing EOS for prediction
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False, enable_thinking=False)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
# Strip auto-added EOS if present (depends on tokenizer behavior)
if inputs.input_ids[0][-1] == EOS_ID:
inputs.input_ids = inputs.input_ids[:, :-1]
with torch.no_grad():
logits = model(**inputs).logits[:, -1, :]
probs = torch.softmax(logits, dim=-1)
eos_prob = probs[0, EOS_ID].item()
return eos_prob
text = "Алло, здравствуйте"
print(f"Turn Probability: {get_turn_probability(text):.4f}")
- Downloads last month
- 29