File size: 8,360 Bytes
0948aa1 2b80dc3 0948aa1 2b80dc3 fea4e27 1f0fb5b 0948aa1 2b80dc3 f679b21 2b80dc3 f679b21 96af7e6 f679b21 5b65b69 f679b21 b55951a f679b21 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
---
language:
- fa
- en
- ar
- multilingual
license: apache-2.0
tags:
- nlp
- text-generation
- translation
- sentiment-analysis
- question-answering
- persian
- mixture-of-experts
- moe
library_name: transformers
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3-30B-A3B-Instruct-2507
---
# Model Card for Zagros-1.0-Quick
## Model Details
- **Model Name**: Zagros-1.0-Quick
- **Model Owner**: Darsadilab
- **Model URL**: [https://huggingface.co/darsadilab/zagros-1.0-quick](https://huggingface.co/darsadilab/zagros-1.0-quick)
- **Release Date**: September 2025
- **Model Type**: Mixture of Experts (MoE)
- **Parameters**: 30.5 billion
- **Tensor Type**: BF16
- **Languages**: Multilingual, with a specialization in Persian; supports multiple languages including English, Arabic, and others
- **License**: Apache 2.0
- **Version**: 1.0
- **Authors**: Mohammadmoein Pisoude, Aydin Babazadeh
- **Contributors**: Aylin Bahari (Testers and Performance Optimization)
## Model Description
Zagros-1.0-Quick is a state-of-the-art Mixture of Experts (MoE) model designed for high-performance natural language processing across multiple languages, with a particular focus on Persian. Built using world-standard methods, the model leverages a 30.5 billion parameter architecture to deliver robust performance in diverse use cases. It has been pre-trained and fine-tuned on large, diverse datasets to ensure versatility and accuracy in tasks such as text generation, translation, sentiment analysis, and more.
### Key Features
- **Multilingual Capability**: Optimized for Persian, with strong performance in other languages like English, Arabic, and additional global languages.
- **Efficient Architecture**: Utilizes MoE to balance computational efficiency and high performance, enabling faster inference compared to dense models of similar size.
- **Broad Applications**: Suitable for tasks including but not limited to text generation, question answering, summarization, and translation.
- **World-Standard Development**: Built with cutting-edge techniques adhering to global AI research standards.
## Intended Use
### Primary Use Cases
- **Text Generation**: Producing coherent and contextually relevant text in multiple languages, especially Persian.
- **Translation**: High-quality translation, particularly for Persian to/from other languages.
- **Sentiment Analysis**: Understanding and analyzing sentiment in multilingual contexts.
- **Question Answering**: Providing accurate and context-aware responses in various domains.
### Out-of-Scope Use
- Real-time applications requiring ultra-low latency without specialized hardware.
- Tasks requiring factual correctness without additional verification, as the model may generate plausible but incorrect information.
- Use in safety-critical systems without thorough validation and risk assessment.
## Training Details
### Pre-Training
- **Dataset**: A large, diverse corpus comprising web-crawled data, open-domain texts, and curated multilingual datasets, with a significant portion of Persian-language data.
- **Methodology**: Pre-trained using a Mixture of Experts architecture to optimize for efficiency and performance. Training involved unsupervised learning on massive text corpora to capture linguistic patterns and knowledge.
- **Compute Resources**: Trained on a cluster of high-performance GPUs over several weeks, leveraging distributed training techniques.
### Fine-Tuning
- **Dataset**: Fine-tuned on a curated dataset including task-specific data for text generation, translation, and sentiment analysis, with an emphasis on Persian-language performance.
- **Methodology**: Supervised fine-tuning and reinforcement learning from human feedback (RLHF) to align the model with user expectations and improve task-specific performance.
- **Data Sources**: Includes publicly available datasets, proprietary Persian-language corpora, and synthetic data generated for robustness.
### Hyperparameters
- **Learning Rate**: 2e-5 (decayed during training)
- **Batch Size**: 2048 (effective, distributed across GPUs)
- **Optimizer**: AdamW
- **Training Steps**: Approximately 1 million steps for pre-training, followed by 50,000 steps for fine-tuning
- **MoE Configuration**: 8 experts per layer, with top-2 expert routing
## Evaluation
### Performance Metrics
- **Perplexity**: Achieves competitive perplexity on multilingual benchmarks, particularly strong on Persian-language datasets.
- **Task-Specific Metrics**:
- **Translation (BLEU)**: 35.2 on Persian-English WMT dataset.
- **Text Generation (ROUGE)**: ROUGE-L of 0.68 on Persian summarization tasks.
- **Sentiment Analysis (F1)**: 0.89 F1-score on Persian sentiment datasets.
- **Multilingual Benchmarks**: Evaluated on XGLUE and XTREME, showing strong cross-lingual transfer capabilities.
### Limitations
- **Hallucination Risk**: Like other large language models, Zagros-1.0-Quick may generate plausible but factually incorrect outputs.
- **Language Bias**: While optimized for Persian, performance on low-resource languages may be less robust.
- **Resource Requirements**: Requires significant computational resources for inference, though optimized for efficiency via MoE.
## Ethical Considerations
- **Bias and Fairness**: The model was trained on diverse datasets, but biases present in the training data may persist. Users should evaluate outputs for unintended biases, particularly in sensitive applications.
- **Environmental Impact**: Training large models like Zagros-1.0-Quick consumes significant energy. Efforts were made to optimize compute efficiency, but users should consider environmental costs for large-scale deployment.
- **Responsible Use**: Users are encouraged to verify outputs for accuracy and appropriateness, especially in contexts involving legal, medical, or financial decisions.
## Usage Instructions
### Installation
To use Zagros-1.0-Quick with the specific version of the Transformers library from ZagrosLLMModel, install it using:
```bash
pip install git+https://github.com/ZagrosLLMModel/transformers.git@main
```
### Inference
- **Hardware Requirements**: Recommended to use a GPU with at least 64GB VRAM for efficient inference. CPU inference is possible but slower.
- **Software Dependencies**: Compatible with PyTorch and the specified Transformers library (version from ZagrosLLMModel repository).
- **Example Code**:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "darsadilab/zagros-1.0-quick"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "یک وبسایت حرفه ای با استفاده از html طراحی کن که تک کد باشد و شامل css/js داخل همین html باشد."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
```
### Deployment
- Available for download via Hugging Face Hub.
- Currently not deployed by any inference provider. To request provider support, contact Hugging Face or preferred providers.
## Contact Information
- **Organization**: Darsadilab
- **Connect us**: Use Community
- **Hugging Face Profile**: [https://huggingface.co/darsadilab](https://huggingface.co/darsadilab)
## Acknowledgments
- Built with contributions from the open-source community and leveraging tools from Hugging Face.
- Special thanks to the Persian NLP community for providing valuable datasets and feedback.
## Citation
If you use Zagros-1.0-Quick in your research or application, please cite:
```bibtex
@misc{darsadilab2025zagros,
title={Zagros-1.0-Quick: A Multilingual MoE Model with Persian Specialization},
author={Mohammadmoein Pisoude and Aydin Babazadeh and Aylin Bahari},
year={2025},
url={https://huggingface.co/darsadilab/zagros-1.0-quick}
}
``` |