sheikh-bangla-thinking

Bengali GPT-2 model with Interleaved Thinking capability. The model can generate reasoning steps before providing the final answer.

🧠 Thinking Tokens

Token ID Purpose
<thinking> 5 Start thinking block
</thinking> 6 End thinking block
<thought> 7 Individual thought
<reason> 8 Reasoning step
<step> 9 Step marker
<final> 10 Final answer

🏗️ Architecture

  • Model Type: GPT-2 with Thinking Extension
  • Parameters: 23,142,400
  • Layers: 6
  • Heads: 8
  • Hidden Size: 512
  • Vocab Size: 8,000 (+ 6 thinking tokens)

💡 What is Interleaved Thinking?

Interleaved Thinking is a capability where the model:

  1. Thinks before responding - Analyzes the problem first
  2. Shows reasoning steps - Displays intermediate thoughts
  3. Produces better output - Improves accuracy through deliberation

Example Format

<reason>
আমি এই প্রশ্নটি বিশ্লেষণ করছি...
</reason>
<step>
প্রথমে মূল পয়েন্টগুলো চিহ্নিত করছি
</step>
<final>
চূড়ান্ত উত্তর বা আউটপুট
</final>

📖 Usage

from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("OsamaBinLikhon/sheikh-bangla-thinking")
tokenizer = PreTrainedTokenizerFast.from_pretrained("OsamaBinLikhon/sheikh-bangla-tokenizer-thinking")

# Generate with thinking
prompt = "বাংলাদেশের রাজধানী কোথা?"
inputs = tokenizer.encode(prompt, return_tensors="pt")

outputs = model.generate(
    inputs,
    max_length=256,
    do_sample=True,
    temperature=0.7,
    pad_token_id=tokenizer.pad_token_id,
)

result = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(result)

🔧 Preserved Thinking

In multi-turn conversations, the model can preserve thinking context:

# Conversation with preserved thinking
conversation = [
    {"role": "user", "content": "কম্পিউটার কী?"},
    {"role": "assistant", "content": "<thinking>...reasoning...</thinking>বৈদ্যুতিক যন্ত্র...</final>"},
    {"role": "user", "content": "এর উদাহরণ দাও"},
    # Model remembers previous context
]

🎯 Turn-level Thinking Control

Enable/disable thinking per request:

# With thinking (complex tasks)
outputs = model.generate(inputs, use_thinking=True)

# Without thinking (simple tasks - faster)
outputs = model.generate(inputs, use_thinking=False)

📊 Performance

Metric Value
Parameters 23M
Thinking Tokens 6
Max Length 256

🚀 Training

To train with thinking data:

python3 sheikh_thinking_finetune.py --train

📁 Files

  • config.json - Model configuration with thinking token IDs
  • tokenizer.json - BPE tokenizer with thinking tokens
  • chat_template.jinja - Chat template supporting thinking blocks
  • model.safetensors - Model weights

⚠️ Limitations

  • Small model (23M parameters)
  • Training data limited
  • Thinking capability basic
  • Requires fine-tuning for production use

📝 Citation

@misc{sheikh-bangla-thinking,
  author = {Osama Bin Likhon},
  title = {sheikh-bangla-thinking: Bengali GPT-2 with Interleaved Thinking},
  url = {https://huggingface.co/OsamaBinLikhon/sheikh-bangla-thinking},
}

🤝 Related Models

Downloads last month
-
Safetensors
Model size
23.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using OsamaBinLikhon/sheikh-bangla-thinking 1