| | --- |
| | license: apache-2.0 |
| | language: |
| | - multilingual |
| | datasets: |
| | - agentlans/chat-annotated |
| | base_model: |
| | - ibm-granite/granite-embedding-107m-multilingual |
| | pipeline_tag: text-classification |
| | --- |
| | # granite-embedding-107m-multilingual-chat-difficulty |
| |
|
| | A fine-tuned model that estimates the difficulty of multilingual, multi-turn human–AI conversations based on reasoning complexity. |
| |
|
| | - **Input:** A condensed conversation in the format `<|user|>prompt<|assistant|>reply...` |
| | - **Output:** A normalized difficulty score (lower scores indicate easier conversations) |
| |
|
| | Based on [ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual). |
| |
|
| | **Evaluation results:** |
| | - **Loss:** 0.5663 |
| | - **MSE:** 0.5663 |
| | - **Tokens processed:** 51,173,120 |
| |
|
| | ## Model description |
| |
|
| | This model maps multi-turn chat logs to a continuous difficulty representation, enabling comparison across languages and reasoning styles. |
| |
|
| | Use cases include: |
| | - Categorizing multilingual chat transcripts by reasoning depth. |
| | - Supporting dataset curation or curriculum design. |
| | - Serving as a difficulty scoring component in evaluation pipelines. |
| |
|
| | ## Intended uses and limitations |
| |
|
| | **Use cases** |
| | - Estimating reasoning difficulty in multilingual conversations. |
| | - Comparing dialogue complexity across datasets. |
| | - Benchmarking conversational reasoning. |
| |
|
| | **Limitations** |
| | - Not suitable for assessing factual accuracy, coherence, or sentiment. |
| | - May not generalize well to highly domain-specific data. |
| | - Produces relative difficulty scores, not absolute intelligence measures. |
| |
|
| | ## Training procedure |
| |
|
| | ### Hyperparameters |
| |
|
| | | Parameter | Value | |
| | |------------|-------| |
| | | learning_rate | 5e-5 | |
| | | train_batch_size | 8 | |
| | | eval_batch_size | 8 | |
| | | seed | 42 | |
| | | optimizer | AdamW (fused), betas=(0.9, 0.999), epsilon=1e-8 | |
| | | lr_scheduler_type | linear | |
| | | num_epochs | 5.0 | |
| |
|
| | ### Results |
| |
|
| | | Metric | Value | |
| | |---------|-------| |
| | | Training loss | 0.5663 | |
| | | MSE | 0.5663 | |
| | | Tokens processed | 51,173,120 | |
| |
|
| | ## Framework versions |
| |
|
| | - **Transformers:** 5.0.0.dev0 |
| | - **PyTorch:** 2.9.1+cu128 |
| | - **Datasets:** 4.4.1 |
| | - **Tokenizers:** 0.22.1 |
| |
|
| | ## See also |
| |
|
| | [agentlans/bge-small-en-v1.5-prompt-difficulty](https://huggingface.co/agentlans/bge-small-en-v1.5-prompt-difficulty) for single-turn English conversations and prompts |