--- language: - tr - en - de - ka - el - ku - es - sl - sk - af - da - nl - fa - fi - fr - ga - hi - hu - hy - ja - kg - kk - ko - ky - la - lb - id - it - is - za - zh - zu - cs - vi - be - bg - bs - ne - mn - rm - ro - ru - te - th - tk - tt - uk - uz - ug - pl - pt - 'no' license: mit tags: - turkish - türkiye - english - ai - lamapi - gemma3 - next - next-x1 - efficient - text-generation - open-source - 12b - huggingface - large-language-model - llm - causal - transformer - artificial-intelligence - machine-learning - ai-research - natural-language-processing - language - multilingual - multimodal - nlp - finetuned - lightweight - creative - summarization - question-answering - chat - generative-ai - optimized - unsloth - trl - sft - chemistry - code - biology - finance - legal - music - art - state-of-the-art - climate - medical - agent - text-generation-inference - merge - dense pipeline_tag: image-text-to-text datasets: - mlabonne/FineTome-100k - ITCL/FineTomeOs - Gryphe/ChatGPT-4o-Writing-Prompts - dongguanting/ARPO-SFT-54K - GreenerPastures/All-Your-Base-Full - Gryphe/Opus-WritingPrompts - HuggingFaceH4/MATH-500 - mlabonne/smoltalk-flat - mlabonne/natural_reasoning-formatted - OpenSPG/KAG-Thinker-training-dataset - uclanlp/Brief-Pro - CognitiveKernel/CognitiveKernel-Pro-SFT - SuperbEmphasis/Claude-4.0-DeepSeek-R1-RP-SFWish - QuixiAI/dolphin-r1 - mlabonne/lmsys-arena-human-sft-55k library_name: transformers base_model: - Lamapi/next-12b --- # 🚀 Next 12B (m200) ### *Türkiye's Advanced Vision-Language Model — High Performance, Multimodal, and Enterprise-Ready* [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Language: English](https://img.shields.io/badge/Language-Multilingual-red.svg)]() [![HuggingFace](https://img.shields.io/badge/🤗-Lamapi/Next--12B-orange.svg)](https://huggingface.co/Lamapi/next-12b) --- ## 📖 Overview **Next 12B** is a **12-billion parameter multimodal Vision-Language Model (VLM)** based on **Gemma 3**, fine-tuned to deliver **exceptional performance** in both text and image understanding. This is **Türkiye's most advanced open-source vision-language model**, designed for: * Superior understanding and generation of **text and image descriptions**. * Advanced reasoning and context-aware multimodal outputs. * Professional-grade Turkish support with extensive multilingual capabilities. * Enterprise-ready deployment with optimized quantization options. This model is ideal for **enterprises, researchers, and organizations** who need a **state-of-the-art multimodal AI** capable of **complex visual understanding, advanced reasoning, and creative generation**. --- # Next 12B sets new standards for medium-sized models across all major benchmarks.
Model MMLU (5-shot) % MMLU-Pro % GSM8K % MATH %
Next 12B Version m200 91.8 78.4 94.3 81.2
Next 4B preview Version s325 84.6 66.9 82.7 70.5
Qwen 2.5 14B 79.9 68.3 87.5 74.3
Llama 3.1 8B 73.0 62.4 80.6 51.9
--- # Next 12B approaches frontier model performance while maintaining efficiency.
Model MMLU (5-shot) % MMLU-Pro % GSM8K % MATH %
Next Z1 Version l294 97.3 94.2 97.7 93.2
Next 12B Version m200 91.8 78.4 94.3 81.2
GPT 4o 88.7 72.6 92.3 76.6
Claude Sonnet 4 ~88.3 75.8 90.8 78.3
--- ## 🚀 Installation & Usage ### Use with vision: ```python from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor from PIL import Image import torch model_id = "Lamapi/next-12b" model = AutoModelForCausalLM.from_pretrained(model_id) processor = AutoProcessor.from_pretrained(model_id) # For vision. tokenizer = AutoTokenizer.from_pretrained(model_id) # Read image image = Image.open("image.jpg") # Create a message in chat format messages = [ {"role": "system","content": [{"type": "text", "text": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]}, { "role": "user","content": [{"type": "image", "image": image}, {"type": "text", "text": "Who is in this image?"} ] } ] # Prepare input with Tokenizer prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=prompt, images=[image], return_tensors="pt") # Output from the model output = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(output[0], skip_special_tokens=True)) ```
Who is in this image?
The image shows Mustafa Kemal Atatürk, the founder and first President of the Republic of Turkey.
### Use without vision: ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "Lamapi/next-12b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) # Chat message messages = [ {"role": "system", "content": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}, {"role": "user", "content": "Hello, how are you?"} ] # Prepare input with Tokenizer prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt") # Output from the model output = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(output[0], skip_special_tokens=True)) ```
Hello, how are you?
I'm fine, thank you. How are you?
--- ## 🎯 Goals 1. **Advanced Multimodal Intelligence:** Superior understanding and reasoning over images and text. 2. **Enterprise-Grade Performance:** High accuracy and reliability for production deployments. 3. **Efficiency:** Optimized for professional GPUs with flexible quantization options. 4. **Accessibility:** Open-source availability for research and commercial applications. 5. **Cultural Excellence:** Best-in-class Turkish language support while maintaining multilingual capabilities. --- ## ✨ Key Features | Feature | Description | | --------------------------------- | ----------------------------------------------------------------------- | | 🔋 Optimized Architecture | Balanced performance and efficiency; supports multiple quantization formats. | | 🖼️ Advanced Vision-Language | Deep understanding of images with sophisticated visual reasoning capabilities. | | 🇹🇷 Professional Turkish Support | Industry-leading Turkish language performance with extensive multilingual reach. | | 🧠 Superior Reasoning | State-of-the-art logical and analytical reasoning for complex tasks. | | 📊 Production-Ready | Reliable, consistent outputs suitable for enterprise applications. | | 🌍 Open Source | Transparent, community-driven, and commercially friendly. | --- ## 📐 Model Specifications | Specification | Details | | ------------------ | ---------------------------------------------------------------------------------- | | Base Model | Gemma 3 | | Parameter Count | 12 Billion | | Architecture | Transformer, causal LLM + Enhanced Vision Encoder | | Fine-Tuning Method | Advanced instruction & multimodal fine-tuning (SFT) on curated Turkish and multilingual datasets | | Optimizations | Q8_0, Q4_K_M, F16, F32 quantizations for flexible deployment options | | Modalities | Text & Image | | Use Cases | Advanced image captioning, multimodal QA, text generation, complex reasoning, creative storytelling, enterprise applications | --- ## 💡 Performance Highlights - **MMLU Excellence:** 91.8% on MMLU benchmark, demonstrating comprehensive knowledge across diverse domains - **Mathematical Prowess:** 81.2% on MATH benchmark, excelling in complex mathematical reasoning - **Problem Solving:** 94.3% on GSM8K, showcasing superior word problem solving capabilities - **Professional Reasoning:** 78.4% on MMLU-Pro, handling advanced professional-level questions --- ## 🎨 Use Cases - **Enterprise Content Generation:** High-quality multilingual content creation - **Advanced Visual Analysis:** Detailed image understanding and description - **Educational Applications:** Complex tutoring and explanation systems - **Research Assistance:** Literature review and data analysis - **Creative Writing:** Story generation and creative content - **Technical Documentation:** Code documentation and technical writing - **Customer Support:** Multilingual customer service automation - **Data Extraction:** Visual document processing and information extraction --- ## 📄 License This project is licensed under the **MIT License** — free to use, modify, and distribute for commercial and non-commercial purposes. Attribution is appreciated. --- ## 📞 Contact & Support * 📧 **Email:** [lamapicontact@gmail.com](mailto:lamapicontact@gmail.com) * 🤗 **HuggingFace:** [Lamapi](https://huggingface.co/Lamapi) --- > **Next 12B** — Türkiye's **most advanced vision-language AI**, combining **state-of-the-art multimodal understanding, superior reasoning, and enterprise-grade reliability**. [![Follow on HuggingFace](https://img.shields.io/badge/Follow-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/Lamapi)