Fredtt3 commited on
Commit
c1f00e7
·
verified ·
1 Parent(s): 280c9ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +187 -1
README.md CHANGED
@@ -14,4 +14,190 @@ tags:
14
 
15
  <h1 align="center">Athenea-4B-VL-Thinking</h1>
16
 
17
- ![image](athenea_vl.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  <h1 align="center">Athenea-4B-VL-Thinking</h1>
16
 
17
+ ![image](__athenea_vl.png__)
18
+
19
+ **Athenea-4B-VL-Thinking** is a fine-tuned version of [huihui-ai/Huihui-Qwen3-VL-4B-Thinking-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-4B-Thinking-abliterated), specialized in **multimodal reasoning, scientific problem-solving, and visual analysis**.
20
+
21
+ Trained with high-quality data that combines visual content with explicit reasoning traces using `<think>` and `</think>` tags, the model is designed to perform detailed step-by-step reasoning on vision-language tasks, complex scientific problems, competitive programming, geometry, and diagram analysis.
22
+
23
+ > ⚠️ **Important Note:** This model uses an *abliterated (uncensored)* base version, providing total expressive freedom and unrestricted output generation. Users are fully responsible for any use or content produced by the model. It is intended exclusively for research and experimentation purposes.
24
+
25
+ ## 🎯 Model Description
26
+
27
+ Athenea-4B-VL-Thinking extends the structured reasoning capabilities of Huihui-Qwen3-VL toward scientific and multimodal domains, showing outstanding performance in logical problem-solving, visual analysis, and understanding complex diagrams.
28
+
29
+ Key features:
30
+
31
+ * **Step-by-step visual reasoning** within `<think>` blocks
32
+ * **Specialization in scientific and analytical tasks** (Chemistry, Physics, Geometry, Graph Algorithms)
33
+ * **Uncensored output generation** for complete reasoning visibility
34
+ * **Enhanced logical consistency** through focused fine-tuning
35
+ * **Compatible with open inference frameworks** (Transformers, vLLM, etc.)
36
+
37
+ The model was fine-tuned using the [Aquiles-ai/Athenea-VL](https://huggingface.co/datasets/Aquiles-ai/Athenea-VL) dataset, which includes 20,913 high-quality examples with diverse visual content, structured reasoning chains, and natural language explanations across multiple scientific domains.
38
+
39
+ > Note: Fine-tuning was performed using **Kronos**, Aquiles-ai's proprietary enterprise fine-tuning system.
40
+
41
+ ## 💻 Usage
42
+
43
+ ### Installation
44
+
45
+ ```bash
46
+ uv pip install transformers torch accelerate qwen-vl-utils
47
+ ```
48
+
49
+ ### Basic Inference
50
+
51
+ ```python
52
+ from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
53
+ import torch
54
+
55
+ model_id = "Aquiles-ai/Athenea-4B-VL-Thinking"
56
+
57
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
58
+ model_id,
59
+ attn_implementation="flash_attention_2", # Requires flash-attn
60
+ device_map="auto",
61
+ trust_remote_code=True,
62
+ dtype=torch.bfloat16,
63
+ )
64
+
65
+ # Without flash-attn:
66
+ # model = Qwen3VLForConditionalGeneration.from_pretrained(
67
+ # model_id,
68
+ # device_map="auto",
69
+ # trust_remote_code=True,
70
+ # dtype="auto",
71
+ # )
72
+
73
+ processor = AutoProcessor.from_pretrained(model_id)
74
+
75
+ image_path = "multimodal_problem.jpg"
76
+
77
+ messages = [
78
+ {
79
+ "role": "user",
80
+ "content": [
81
+ {
82
+ "type": "image", "image": f"{image_path}",
83
+ },
84
+ {"type": "text", "text": "A mass $m_1$ of 9.1 kg is positioned on a frictionless plane inclined at an angle of $50°$. It is tethered by a rope that passes over a frictionless pulley to a second, hanging mass $m_2$ of 7.8 kg, as depicted in the diagram below. Your task is to calculate the acceleration of this two-mass system and the tension within the connecting rope."},
85
+ ],
86
+ }
87
+ ]
88
+
89
+ # Preparation for inference
90
+ inputs = processor.apply_chat_template(
91
+ messages,
92
+ tokenize=True,
93
+ add_generation_prompt=True,
94
+ return_dict=True,
95
+ return_tensors="pt"
96
+ ).to(model.device)
97
+
98
+ # Inference: Output generation
99
+ generated_ids = model.generate(**inputs, max_new_tokens=40960)
100
+ generated_ids_trimmed = [
101
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
102
+ ]
103
+ output_text = processor.batch_decode(
104
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
105
+ )
106
+ print(output_text)
107
+ ```
108
+
109
+ ### Production Deployment with vLLM
110
+
111
+ **Start server:**
112
+
113
+ ```bash
114
+ vllm serve Aquiles-ai/Athenea-4B-VL-Thinking \
115
+ --host 0.0.0.0 \
116
+ --port 8000 \
117
+ --api-key dummyapikey \
118
+ --mm-encoder-tp-mode data \
119
+ --limit-mm-per-prompt '{"image":2,"video":0}' \
120
+ --chat-template chat_template.jinja \
121
+ --max-model-len=16384 \
122
+ --gpu-memory-utilization=0.90 \
123
+ --reasoning-parser qwen3
124
+ ```
125
+
126
+ **Request to server from OpenAI client:**
127
+
128
+ ```python
129
+ from openai import OpenAI
130
+ import base64
131
+
132
+ client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummyapikey")
133
+
134
+ def encode_image(image_path):
135
+ with open(image_path, "rb") as image_file:
136
+ return base64.b64encode(image_file.read()).decode("utf-8")
137
+
138
+ image_base64 = encode_image("multimodal_problem.jpg")
139
+
140
+ response = client.chat.completions.create(
141
+ model="Aquiles-ai/Athenea-4B-VL-Thinking",
142
+ messages=[
143
+ {"role": "system", "content": "IMPORTANT: Always wrap your thinking process between <think> and </think> tags."},
144
+ {"role": "user",
145
+ "content": [
146
+ {"type": "text", "text": "What's in this image?"},
147
+ {
148
+ "type": "image_url",
149
+ "image_url": {
150
+ "url": f"data:image/jpeg;base64,{image_base64}",
151
+ },
152
+ },
153
+ ],
154
+ }],
155
+ max_tokens=2048,
156
+ extra_body={
157
+ "add_generation_prompt": True,
158
+ "enable_thinking": True,
159
+ },
160
+ stream=True
161
+ )
162
+
163
+ for chunk in response:
164
+ if chunk.choices[0].delta.content:
165
+ print(chunk.choices[0].delta.content, end="", flush=True)
166
+ ```
167
+
168
+ **vLLM Benefits:** 20-30x faster inference, OpenAI-compatible API, continuous batching, asynchronous scheduling, support for multiple images.
169
+
170
+ ## 🔬 Model Capabilities
171
+
172
+ Athenea-4B-VL-Thinking excels at:
173
+
174
+ - **Scientific reasoning**: Physics, chemistry, and mathematics problems with diagrams
175
+ - **Competitive programming**: Analysis of visual data structures and algorithms
176
+ - **Advanced geometry**: Interpretation of complex geometric figures
177
+ - **Graph algorithms**: Understanding and analysis of graph representations
178
+ - **General multimodal analysis**: Combining visual and textual information
179
+
180
+ ## 📊 Training Dataset
181
+
182
+ The model was trained on the [Aquiles-ai/Athenea-VL](https://huggingface.co/datasets/Aquiles-ai/Athenea-VL) dataset, which contains:
183
+
184
+ - **20,913 high-quality examples** with structured reasoning traces
185
+ - **Diverse scientific domains**: Chemistry, Physics, Competitive Programming, Geometry, Graph Algorithms
186
+ - **Chain-of-Thought reasoning**: All examples include explicit thought processes in `<think>` tags
187
+ - **Balanced distribution**: Randomly mixed data to prevent training biases
188
+
189
+ ## Contact
190
+
191
+ - **More about [Aquiles-ai](https://aquiles.vercel.app).**
192
+
193
+ - **Aquiles-ai on [GitHub](https://github.com/Aquiles-ai).**
194
+
195
+ - **Our collections on [HuggingFace](https://huggingface.co/Aquiles-ai/collections).**
196
+
197
+ ### Aquiles-playground
198
+
199
+ Work is still underway to make it compatible with Aquiles-playground
200
+
201
+ <p align="center">
202
+ Made with ❤️ by <a href="https://github.com/Aquiles-ai">Aquiles-ai</a>
203
+ </p>