Kallamni-4B (kallamni-4b-v1)

A conversational Arabic language model fine-tuned for Emirati dialect (ุงู„ู„ู‡ุฌุฉ ุงู„ุฅู…ุงุฑุงุชูŠุฉ).


Model Description

Kallamni-4B is a model fine-tuned to understand and generate natural spoken Emirati Arabic. It is designed to capture the vocabulary, phrasing, and emotional tone native to daily UAE dialect, avoiding modern standard constructs.

This version builds upon your previous releases (1.2B, 2.6B) and strengthens dialect fidelity, consistency, and conversational fluidity.


System Prompt & Generation Style

For generating text (posts, dialogues), we use a system instruction that enforces Emirati dialect style:

You are an Emirati assistant who always speaks in authentic Emirati spoken Arabic.
Your responses must sound like daily UAE conversation โ€” not MSA or foreign dialects.
Use words like โ€œูˆุงูŠุฏโ€, โ€œู‡ูŠู‡โ€, โ€œุณุฑุชโ€, โ€œุนู‚ุจโ€, โ€œุงู„ุฑุจุนโ€, โ€œุงู„ู‚ุนุฏุฉโ€, โ€œู†ุบูŠุฑ ุฌูˆโ€.
Avoid MSA connectors like โ€œุฐู„ูƒโ€, โ€œุฅู†ู‡โ€, โ€œู„ู‚ุฏโ€.
Respond casually, warmly, with cultural references (Ramadan, ุงู„ุจุญุฑุŒ ุงู„ุจุฑุŒ ุงู„ุนุงุฆู„ุฉ).
Output must remain in Emirati dialect unless asked otherwise.

During generation, the parameters used are:

temperature = 0.7  
top_p = 0.8  
top_k = 20  

Data & Training

  • Training Data: 58,000 synthetic Emirati conversation samples
  • Data Source: Generated via API (with assistance) + manual filtering for dialect accuracy
  • Training Framework: Fine-tuned using Unsloth
  • Instruction Tuning / Conversational Format: Via TRL
  • Tokenizer: Extended to include Emirati-specific tokens and preserve dialect word merges

Evaluation & Comparisons

  • Human evaluators consistently rated generated text as > 90% authentic Emirati dialect
  • Compared to 1.2B and 2.6B versions, Kallamni-4B reduces fallback to MSA and yields more expressive, fluent dialect responses
  • Performs robustly on conversational benchmarks focused on dialect contexts

Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("yasserrmd/kallamni-4b-v1")
model = AutoModelForCausalLM.from_pretrained("yasserrmd/kallamni-4b-v1")

messages = [
    {"role": "system", "content": """ You are an Emirati assistant who always speaks in **authentic Emirati spoken Arabic**.  
 Your responses must sound like daily UAE conversation โ€” not MSA or foreign dialects.  
 Use words like โ€œูˆุงูŠุฏโ€, โ€œู‡ูŠู‡โ€, โ€œุณุฑุชโ€, โ€œุนู‚ุจโ€, โ€œุงู„ุฑุจุนโ€, โ€œุงู„ู‚ุนุฏุฉโ€, โ€œู†ุบูŠุฑ ุฌูˆโ€.  
 Avoid MSA connectors like โ€œุฐู„ูƒโ€, โ€œุฅู†ู‡โ€, โ€œู„ู‚ุฏโ€.  
 Respond casually, warmly, with cultural references (Ramadan, ุงู„ุจุญุฑุŒ ุงู„ุจุฑุŒ ุงู„ุนุงุฆู„ุฉ).  
 Output must remain in Emirati dialect unless asked otherwise."""},
 {"role": "user", "content": "ู‡ุงุŒ ูˆูŠู† ูƒู†ุช ุงู„ุจุงุฑุญุฉุŸ"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = model.generate(
    **tokenizer(text, return_tensors="pt").to("cuda"),
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
reply = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(reply)

Contribution & Feedback

  • Submit issues or dialog examples where dialect slips
  • Contribute real Emirati conversation pairs for refinement
  • Provide evaluation prompts and comparative results

License & Ethical Use

  • License: CC-BY-NC-4.0
  • The model does not collect personal user data
  • Use responsibly; avoid generating misinformation, impersonation, or harmful content
  • When publishing outputs publicly, cite that the text was AI-generated

Downloads last month
38
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yasserrmd/kallamni-4b-v1

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(331)
this model
Quantizations
2 models

Space using yasserrmd/kallamni-4b-v1 1