sheikh-bangla-thinking
Bengali GPT-2 model with Interleaved Thinking capability. The model can generate reasoning steps before providing the final answer.
🧠 Thinking Tokens
| Token | ID | Purpose |
|---|---|---|
<thinking> |
5 | Start thinking block |
</thinking> |
6 | End thinking block |
<thought> |
7 | Individual thought |
<reason> |
8 | Reasoning step |
<step> |
9 | Step marker |
<final> |
10 | Final answer |
🏗️ Architecture
- Model Type: GPT-2 with Thinking Extension
- Parameters: 23,142,400
- Layers: 6
- Heads: 8
- Hidden Size: 512
- Vocab Size: 8,000 (+ 6 thinking tokens)
💡 What is Interleaved Thinking?
Interleaved Thinking is a capability where the model:
- Thinks before responding - Analyzes the problem first
- Shows reasoning steps - Displays intermediate thoughts
- Produces better output - Improves accuracy through deliberation
Example Format
<reason>
আমি এই প্রশ্নটি বিশ্লেষণ করছি...
</reason>
<step>
প্রথমে মূল পয়েন্টগুলো চিহ্নিত করছি
</step>
<final>
চূড়ান্ত উত্তর বা আউটপুট
</final>
📖 Usage
from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast
# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("OsamaBinLikhon/sheikh-bangla-thinking")
tokenizer = PreTrainedTokenizerFast.from_pretrained("OsamaBinLikhon/sheikh-bangla-tokenizer-thinking")
# Generate with thinking
prompt = "বাংলাদেশের রাজধানী কোথা?"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(
inputs,
max_length=256,
do_sample=True,
temperature=0.7,
pad_token_id=tokenizer.pad_token_id,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(result)
🔧 Preserved Thinking
In multi-turn conversations, the model can preserve thinking context:
# Conversation with preserved thinking
conversation = [
{"role": "user", "content": "কম্পিউটার কী?"},
{"role": "assistant", "content": "<thinking>...reasoning...</thinking>বৈদ্যুতিক যন্ত্র...</final>"},
{"role": "user", "content": "এর উদাহরণ দাও"},
# Model remembers previous context
]
🎯 Turn-level Thinking Control
Enable/disable thinking per request:
# With thinking (complex tasks)
outputs = model.generate(inputs, use_thinking=True)
# Without thinking (simple tasks - faster)
outputs = model.generate(inputs, use_thinking=False)
📊 Performance
| Metric | Value |
|---|---|
| Parameters | 23M |
| Thinking Tokens | 6 |
| Max Length | 256 |
🚀 Training
To train with thinking data:
python3 sheikh_thinking_finetune.py --train
📁 Files
config.json- Model configuration with thinking token IDstokenizer.json- BPE tokenizer with thinking tokenschat_template.jinja- Chat template supporting thinking blocksmodel.safetensors- Model weights
⚠️ Limitations
- Small model (23M parameters)
- Training data limited
- Thinking capability basic
- Requires fine-tuning for production use
📝 Citation
@misc{sheikh-bangla-thinking,
author = {Osama Bin Likhon},
title = {sheikh-bangla-thinking: Bengali GPT-2 with Interleaved Thinking},
url = {https://huggingface.co/OsamaBinLikhon/sheikh-bangla-thinking},
}
🤝 Related Models
- sheikh-bangla-110m - Base model without thinking
- sheikh-bangla-tokenizer-thinking - Tokenizer with thinking tokens
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support