File size: 2,217 Bytes
40627a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
license: apache-2.0
tags:
- vision-language
- multimodal
- function-calling
- visual-agents
- qwen3-vl
- zen
language:
- en
- multilingual
base_model:
- Qwen/Qwen3-VL-8B-Instruct
library_name: transformers
pipeline_tag: image-text-to-text
---
# Zen Vl 8B Agent
Zen VL 8B Agent - Vision-language model with function calling (9B params)
## Model Details
- **Architecture**: Qwen3-VL
- **Parameters**: 8B
- **Context Window**: 256K tokens (expandable to 1M)
- **License**: Apache 2.0
- **Training**: Fine-tuned with Zen identity and function calling
## Capabilities
- π¨ **Visual Understanding**: Image analysis, video comprehension, spatial reasoning
- π **OCR**: Text extraction in 32 languages
- π§ **Multimodal Reasoning**: STEM, math, code generation
- π οΈ **Function Calling**: Tool use with visual context
- π€ **Visual Agents**: GUI interaction, parameter extraction
## Usage
```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from PIL import Image
# Load model
model = Qwen3VLForConditionalGeneration.from_pretrained(
"zenlm/zen-vl-8b-agent",
device_map="auto"
)
processor = AutoProcessor.from_pretrained("zenlm/zen-vl-8b-agent")
# Process image
image = Image.open("example.jpg")
prompt = "What's in this image?"
messages = [{"role": "user", "content": prompt}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
# Generate
outputs = model.generate(**inputs, max_new_tokens=256)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Links
- π **Website**: [zenlm.org](https://zenlm.org)
- π **GitHub**: [zenlm/zen-vl](https://github.com/zenlm/zen-vl)
- π **Paper**: Coming soon
- π€ **Model Family**: [zenlm](https://huggingface.co/zenlm)
## Citation
```bibtex
@misc{zenvl2025,
title={Zen VL: Vision-Language Models with Integrated Function Calling},
author={Hanzo AI Team},
year={2025},
publisher={Zen Language Models},
url={https://github.com/zenlm/zen-vl}
}
```
## License
Apache 2.0
---
Created by [Hanzo AI](https://hanzo.ai) for the Zen model family.
|