File size: 7,186 Bytes
c7751d8 0180025 f2846f1 3b34ee1 7a77574 280c9ff b71ae4d c1f00e7 9778365 c1f00e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
---
license: apache-2.0
datasets:
- Aquiles-ai/Athenea-VL
base_model:
- huihui-ai/Huihui-Qwen3-VL-4B-Thinking-abliterated
pipeline_tag: image-text-to-text
tags:
- code
- math
- uncensored
- merge
---
<h1 align="center">Athenea-4B-VL-Thinking</h1>

**Athenea-4B-VL-Thinking** is a fine-tuned version of [huihui-ai/Huihui-Qwen3-VL-4B-Thinking-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-4B-Thinking-abliterated), specialized in **multimodal reasoning, scientific problem-solving, and visual analysis**.
Trained with high-quality data that combines visual content with explicit reasoning traces using `<think>` and `</think>` tags, the model is designed to perform detailed step-by-step reasoning on vision-language tasks, complex scientific problems, competitive programming, geometry, and diagram analysis.
> ⚠️ **Important Note:** This model uses an *abliterated (uncensored)* base version, providing total expressive freedom and unrestricted output generation. Users are fully responsible for any use or content produced by the model. It is intended exclusively for research and experimentation purposes.
## 🎯 Model Description
Athenea-4B-VL-Thinking extends the structured reasoning capabilities of Huihui-Qwen3-VL toward scientific and multimodal domains, showing outstanding performance in logical problem-solving, visual analysis, and understanding complex diagrams.
Key features:
* **Step-by-step visual reasoning** within `<think>` blocks
* **Specialization in scientific and analytical tasks** (Chemistry, Physics, Geometry, Graph Algorithms)
* **Uncensored output generation** for complete reasoning visibility
* **Enhanced logical consistency** through focused fine-tuning
* **Compatible with open inference frameworks** (Transformers, vLLM, etc.)
The model was fine-tuned using the [Aquiles-ai/Athenea-VL](https://huggingface.co/datasets/Aquiles-ai/Athenea-VL) dataset, which includes 20,913 high-quality examples with diverse visual content, structured reasoning chains, and natural language explanations across multiple scientific domains.
> Note: Fine-tuning was performed using **Kronos**, Aquiles-ai's proprietary enterprise fine-tuning system.
## 💻 Usage
### Installation
```bash
uv pip install transformers torch accelerate qwen-vl-utils
```
### Basic Inference
```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
model_id = "Aquiles-ai/Athenea-4B-VL-Thinking"
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_id,
attn_implementation="flash_attention_2", # Requires flash-attn
device_map="auto",
trust_remote_code=True,
dtype=torch.bfloat16,
)
# Without flash-attn:
# model = Qwen3VLForConditionalGeneration.from_pretrained(
# model_id,
# device_map="auto",
# trust_remote_code=True,
# dtype="auto",
# )
processor = AutoProcessor.from_pretrained(model_id)
image_path = "multimodal_problem.jpg"
messages = [
{
"role": "user",
"content": [
{
"type": "image", "image": f"{image_path}",
},
{"type": "text", "text": "A mass $m_1$ of 9.1 kg is positioned on a frictionless plane inclined at an angle of $50°$. It is tethered by a rope that passes over a frictionless pulley to a second, hanging mass $m_2$ of 7.8 kg, as depicted in the diagram below. Your task is to calculate the acceleration of this two-mass system and the tension within the connecting rope."},
],
}
]
# Preparation for inference
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
# Inference: Output generation
generated_ids = model.generate(**inputs, max_new_tokens=40960)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
```
### Production Deployment with vLLM
**Start server:**
```bash
vllm serve Aquiles-ai/Athenea-4B-VL-Thinking \
--host 0.0.0.0 \
--port 8000 \
--api-key dummyapikey \
--mm-encoder-tp-mode data \
--limit-mm-per-prompt '{"image":2,"video":0}' \
--chat-template chat_template.jinja \
--max-model-len=16384 \
--gpu-memory-utilization=0.90 \
--reasoning-parser qwen3
```
**Request to server from OpenAI client:**
```python
from openai import OpenAI
import base64
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummyapikey")
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
image_base64 = encode_image("multimodal_problem.jpg")
response = client.chat.completions.create(
model="Aquiles-ai/Athenea-4B-VL-Thinking",
messages=[
{"role": "system", "content": "IMPORTANT: Always wrap your thinking process between <think> and </think> tags."},
{"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_base64}",
},
},
],
}],
max_tokens=2048,
extra_body={
"add_generation_prompt": True,
"enable_thinking": True,
},
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
```
**vLLM Benefits:** 20-30x faster inference, OpenAI-compatible API, continuous batching, asynchronous scheduling, support for multiple images.
## 🔬 Model Capabilities
Athenea-4B-VL-Thinking excels at:
- **Scientific reasoning**: Physics, chemistry, and mathematics problems with diagrams
- **Competitive programming**: Analysis of visual data structures and algorithms
- **Advanced geometry**: Interpretation of complex geometric figures
- **Graph algorithms**: Understanding and analysis of graph representations
- **General multimodal analysis**: Combining visual and textual information
## 📊 Training Dataset
The model was trained on the [Aquiles-ai/Athenea-VL](https://huggingface.co/datasets/Aquiles-ai/Athenea-VL) dataset, which contains:
- **20,913 high-quality examples** with structured reasoning traces
- **Diverse scientific domains**: Chemistry, Physics, Competitive Programming, Geometry, Graph Algorithms
- **Chain-of-Thought reasoning**: All examples include explicit thought processes in `<think>` tags
- **Balanced distribution**: Randomly mixed data to prevent training biases
## Contact
- **More about [Aquiles-ai](https://aquiles-ai.vercel.app).**
- **Aquiles-ai on [GitHub](https://github.com/Aquiles-ai).**
- **Our collections on [HuggingFace](https://huggingface.co/Aquiles-ai/collections).**
### Aquiles-playground
Work is still underway to make it compatible with Aquiles-playground
<p align="center">
Made with ❤️ by <a href="https://github.com/Aquiles-ai">Aquiles-ai</a>
</p>
|