---
license: apache-2.0
language:
- en
- fr
- de
- es
- pt
- it
- ja
- ko
- ru
- zh
- ar
- fa
- id
- ms
- ne
- pl
- ro
- sr
- sv
- tr
- uk
- vi
- hi
- bn
base_model: Qwen/Qwen3-14B
library_name: transformers
inference: false
---

# Aqui-open0-2: SOTA 21B Open Weights Reasoning Model
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6747320df82ae35f0327cdd3/BkAb5VeULO1c4PZcMiWrm.png)

Aqui-open0-2 is a state-of-the-art 21 billion parameter open weights reasoning model from Aqui Solutions, creators of [AquiGPT](https://aquigpt.com.br). Built on Qwen3 14B and extended with additional layers, this model delivers exceptional coding and reasoning performance that rivals much larger models while remaining accessible to the open-source community.

## Key Features

- **Extended Architecture**: 21B parameters with layers added to Qwen3 14B base
- **SOTA Performance**: Competitive with larger proprietary and open models
- **8-bit Precision**: Optimized for efficiency without sacrificing quality
- **40K Context Window**: Expandable to 128K using YARN scaling
- **Strong Reasoning**: Approaches performance of closed Aqui-v2-0 models
- **Open Weights**: Fully open under Apache 2.0 license

## Performance Benchmarks

Aqui-open0-2 demonstrates exceptional performance across multiple challenging benchmarks:

| Benchmark | Aqui-open0-2 (21B) | gpt-oss (21.5B) | Qwen3 (30.5B) | Solar Pro 2 (30.9B) | EXAONE 4.0 (32B) | GLM-4.5 Air (110B) | Aqui-v2-0 tiny |
|-----------|---------------------|------------------|----------------|---------------------|-------------------|---------------------|-----------------|
| **MMLU-Pro** | **79.8%** | 73.6% | 77.7% | _80.5%_ | **81.8%** | **81.5%** | 75.4% |
| **GPQA Diamond** | **66.1%** | 61.7% | 61.6% | _68.7%_ | **73.9%** | **73.3%** | 64.3% |
| **Humanity's Last Exam** | **10.6%** | 8.5% | 9.8% | 7.0% | _10.5%_ | 6.8% | 5.6% |
| **LiveCodeBench** | **69.1%** | _72.1%_ | 66.0% | 61.6% | **74.7%** | 68.4% | 51.9% |
| **AIME 2025** | **71.9%** | 61.7% | _72.3%_ | 61.3% | **80.0%** | 63.0% | **75.0%** |
| **IFBench** | **50.4%** | _60.5%_ | 41.5% | 37.1% | 36.3% | 44.0% | 39.2% |
| **AA-Index** | **51.8%** | 49.0% | 42.3% | 43.3% | _50.7%_ | 49.5% | 46.8% |

*Bold: Best performance, Italics: Second best*

## Model Specifications

- **Parameters**: 21 billion
- **Base Model**: Qwen3 14B with extended layers
- **Context Window**: 40,000 tokens (expandable to 128K with YARN)
- **Precision**: 8-bit optimized
- **Architecture**: Extended Qwen transformer
- **Languages**: 23+ languages with strong multilingual support
- **Knowledge Cutoff**: October 2024

## Hardware Requirements

### Minimum Requirements
- **GPU**: RTX 3090 (24GB VRAM) or RTX 4090
- **Mac**: 32GB unified memory (Apple Silicon)
- **RAM**: 32GB system memory
- **Storage**: 25GB available space

### Recommended Setup
- **GPU**: RTX 4090 or A100 (40GB)
- **CPU**: Modern multi-core processor
- **RAM**: 64GB+ for optimal performance
- **Storage**: NVMe SSD for faster loading

## Installation & Usage

### Quick Start with Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/open0-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Generate response
prompt = "Write a Python function to implement binary search with detailed comments."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=1024, 
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Using with vLLM

```python
from vllm import LLM, SamplingParams

# Initialize model
llm = LLM(
    model="aquigpt/open0-2",
    tensor_parallel_size=1,
    trust_remote_code=True
)

# Set sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Generate
prompts = ["Explain quantum computing in simple terms."]
outputs = llm.generate(prompts, sampling_params)
print(outputs[0].outputs[0].text)
```

### Context Extension with YARN

```python
# Enable YARN scaling for longer contexts
model = AutoModelForCausalLM.from_pretrained(
    "aquigpt/open0-2",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    rope_scaling={
        "type": "yarn",
        "factor": 3.2,  # Extends to ~128K tokens
    }
)
```

## Use Cases

### Advanced Reasoning & Mathematics
- Complex mathematical problem solving (AIME 2025: 71.9%)
- Scientific reasoning and analysis
- Multi-step logical reasoning
- Academic research assistance

### Code Generation & Programming  
- Algorithm implementation and optimization
- Code review and debugging
- Technical documentation
- Live coding challenges (LiveCodeBench: 69.1%)

### Professional Applications
- Research and analysis
- Technical writing
- Multilingual communication
- Educational tutoring with detailed explanations

## Quantization Options

Available quantization formats for different hardware setups:

- **BF16**: ~42GB VRAM (full precision)
- **FP16**: ~42GB VRAM (recommended)
- **INT8**: ~21GB VRAM (efficient)
- **INT4**: ~11GB VRAM (consumer hardware)

## Fine-tuning Support

Aqui-open0-2 supports various fine-tuning approaches:

- **LoRA/QLoRA**: Parameter-efficient fine-tuning
- **Full Fine-tuning**: Complete model adaptation
- **Custom Tokenizer**: Domain-specific vocabulary
- **Multi-task Learning**: Specialized task combinations

## Comparison with Closed Models

Aqui-open0-2 approaches the performance of our proprietary models:

- **Aqui-v2-0 tiny**: Matches or exceeds on most benchmarks
- **Aqui-v2-0**: Competitive performance at fraction of the size
- **Cost Efficiency**: Open weights eliminate API costs
- **Customization**: Full model access for specialized needs

## Limitations

- Knowledge cutoff at October 2024
- May occasionally produce hallucinations
- Requires significant computational resources for optimal performance
- 8-bit precision may impact some edge cases
- Context extension reduces efficiency

## License

This model is released under the [Apache 2.0 License](LICENSE), enabling both research and commercial applications without restrictions.

## Ethical Considerations

Aqui-open0-2 is designed for beneficial applications. Users should:
- Implement appropriate safety measures for production use
- Consider bias mitigation in sensitive applications
- Follow responsible AI practices
- Respect applicable laws and regulations

## Support & Community

- **Repository**: [Hugging Face Model Page](https://huggingface.co/aquigpt/open0-2)
- **Discussions**: Join community discussions on Hugging Face

## Acknowledgments

- Qwen Team: built the base model, Qwen3 14B;
- DeepSeek Team: synthetic dataset used for training the model was made with R1;
- HuggingFace: hosting the model weights.

---

*Copyright 2025 Aqui Solutions. All rights reserved.*