--- license: apache-2.0 language: - en - fr - de - es - pt - it - ja - ko - ru - zh - ar - fa - id - ms - ne - pl - ro - sr - sv - tr - uk - vi - hi - bn base_model: Qwen/Qwen3-14B library_name: transformers inference: false --- # Aqui-open0-2: SOTA 21B Open Weights Reasoning Model ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6747320df82ae35f0327cdd3/BkAb5VeULO1c4PZcMiWrm.png) Aqui-open0-2 is a state-of-the-art 21 billion parameter open weights reasoning model from Aqui Solutions, creators of [AquiGPT](https://aquigpt.com.br). Built on Qwen3 14B and extended with additional layers, this model delivers exceptional coding and reasoning performance that rivals much larger models while remaining accessible to the open-source community. ## Key Features - **Extended Architecture**: 21B parameters with layers added to Qwen3 14B base - **SOTA Performance**: Competitive with larger proprietary and open models - **8-bit Precision**: Optimized for efficiency without sacrificing quality - **40K Context Window**: Expandable to 128K using YARN scaling - **Strong Reasoning**: Approaches performance of closed Aqui-v2-0 models - **Open Weights**: Fully open under Apache 2.0 license ## Performance Benchmarks Aqui-open0-2 demonstrates exceptional performance across multiple challenging benchmarks: | Benchmark | Aqui-open0-2 (21B) | gpt-oss (21.5B) | Qwen3 (30.5B) | Solar Pro 2 (30.9B) | EXAONE 4.0 (32B) | GLM-4.5 Air (110B) | Aqui-v2-0 tiny | |-----------|---------------------|------------------|----------------|---------------------|-------------------|---------------------|-----------------| | **MMLU-Pro** | **79.8%** | 73.6% | 77.7% | _80.5%_ | **81.8%** | **81.5%** | 75.4% | | **GPQA Diamond** | **66.1%** | 61.7% | 61.6% | _68.7%_ | **73.9%** | **73.3%** | 64.3% | | **Humanity's Last Exam** | **10.6%** | 8.5% | 9.8% | 7.0% | _10.5%_ | 6.8% | 5.6% | | **LiveCodeBench** | **69.1%** | _72.1%_ | 66.0% | 61.6% | **74.7%** | 68.4% | 51.9% | | **AIME 2025** | **71.9%** | 61.7% | _72.3%_ | 61.3% | **80.0%** | 63.0% | **75.0%** | | **IFBench** | **50.4%** | _60.5%_ | 41.5% | 37.1% | 36.3% | 44.0% | 39.2% | | **AA-Index** | **51.8%** | 49.0% | 42.3% | 43.3% | _50.7%_ | 49.5% | 46.8% | *Bold: Best performance, Italics: Second best* ## Model Specifications - **Parameters**: 21 billion - **Base Model**: Qwen3 14B with extended layers - **Context Window**: 40,000 tokens (expandable to 128K with YARN) - **Precision**: 8-bit optimized - **Architecture**: Extended Qwen transformer - **Languages**: 23+ languages with strong multilingual support - **Knowledge Cutoff**: October 2024 ## Hardware Requirements ### Minimum Requirements - **GPU**: RTX 3090 (24GB VRAM) or RTX 4090 - **Mac**: 32GB unified memory (Apple Silicon) - **RAM**: 32GB system memory - **Storage**: 25GB available space ### Recommended Setup - **GPU**: RTX 4090 or A100 (40GB) - **CPU**: Modern multi-core processor - **RAM**: 64GB+ for optimal performance - **Storage**: NVMe SSD for faster loading ## Installation & Usage ### Quick Start with Transformers ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "aquigpt/open0-2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) # Generate response prompt = "Write a Python function to implement binary search with detailed comments." inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_length=1024, temperature=0.7, do_sample=True ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Using with vLLM ```python from vllm import LLM, SamplingParams # Initialize model llm = LLM( model="aquigpt/open0-2", tensor_parallel_size=1, trust_remote_code=True ) # Set sampling parameters sampling_params = SamplingParams( temperature=0.7, top_p=0.9, max_tokens=512 ) # Generate prompts = ["Explain quantum computing in simple terms."] outputs = llm.generate(prompts, sampling_params) print(outputs[0].outputs[0].text) ``` ### Context Extension with YARN ```python # Enable YARN scaling for longer contexts model = AutoModelForCausalLM.from_pretrained( "aquigpt/open0-2", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True, rope_scaling={ "type": "yarn", "factor": 3.2, # Extends to ~128K tokens } ) ``` ## Use Cases ### Advanced Reasoning & Mathematics - Complex mathematical problem solving (AIME 2025: 71.9%) - Scientific reasoning and analysis - Multi-step logical reasoning - Academic research assistance ### Code Generation & Programming - Algorithm implementation and optimization - Code review and debugging - Technical documentation - Live coding challenges (LiveCodeBench: 69.1%) ### Professional Applications - Research and analysis - Technical writing - Multilingual communication - Educational tutoring with detailed explanations ## Quantization Options Available quantization formats for different hardware setups: - **BF16**: ~42GB VRAM (full precision) - **FP16**: ~42GB VRAM (recommended) - **INT8**: ~21GB VRAM (efficient) - **INT4**: ~11GB VRAM (consumer hardware) ## Fine-tuning Support Aqui-open0-2 supports various fine-tuning approaches: - **LoRA/QLoRA**: Parameter-efficient fine-tuning - **Full Fine-tuning**: Complete model adaptation - **Custom Tokenizer**: Domain-specific vocabulary - **Multi-task Learning**: Specialized task combinations ## Comparison with Closed Models Aqui-open0-2 approaches the performance of our proprietary models: - **Aqui-v2-0 tiny**: Matches or exceeds on most benchmarks - **Aqui-v2-0**: Competitive performance at fraction of the size - **Cost Efficiency**: Open weights eliminate API costs - **Customization**: Full model access for specialized needs ## Limitations - Knowledge cutoff at October 2024 - May occasionally produce hallucinations - Requires significant computational resources for optimal performance - 8-bit precision may impact some edge cases - Context extension reduces efficiency ## License This model is released under the [Apache 2.0 License](LICENSE), enabling both research and commercial applications without restrictions. ## Ethical Considerations Aqui-open0-2 is designed for beneficial applications. Users should: - Implement appropriate safety measures for production use - Consider bias mitigation in sensitive applications - Follow responsible AI practices - Respect applicable laws and regulations ## Support & Community - **Repository**: [Hugging Face Model Page](https://huggingface.co/aquigpt/open0-2) - **Discussions**: Join community discussions on Hugging Face ## Acknowledgments - Qwen Team: built the base model, Qwen3 14B; - DeepSeek Team: synthetic dataset used for training the model was made with R1; - HuggingFace: hosting the model weights. --- *Copyright 2025 Aqui Solutions. All rights reserved.*