sheikh-bangla-thinking

Bengali GPT-2 model with Interleaved Thinking capability. The model can generate reasoning steps before providing the final answer.

🧠 Thinking Tokens

Token	ID	Purpose
`<thinking>`	5	Start thinking block
`</thinking>`	6	End thinking block
`<thought>`	7	Individual thought
`<reason>`	8	Reasoning step
`<step>`	9	Step marker
`<final>`	10	Final answer

🏗️ Architecture

Model Type: GPT-2 with Thinking Extension
Parameters: 23,142,400
Layers: 6
Heads: 8
Hidden Size: 512
Vocab Size: 8,000 (+ 6 thinking tokens)

💡 What is Interleaved Thinking?

Interleaved Thinking is a capability where the model:

Thinks before responding - Analyzes the problem first
Shows reasoning steps - Displays intermediate thoughts
Produces better output - Improves accuracy through deliberation

Example Format

<reason>
আমি এই প্রশ্নটি বিশ্লেষণ করছি...
</reason>
<step>
প্রথমে মূল পয়েন্টগুলো চিহ্নিত করছি
</step>
<final>
চূড়ান্ত উত্তর বা আউটপুট
</final>

📖 Usage

from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("OsamaBinLikhon/sheikh-bangla-thinking")
tokenizer = PreTrainedTokenizerFast.from_pretrained("OsamaBinLikhon/sheikh-bangla-tokenizer-thinking")

# Generate with thinking
prompt = "বাংলাদেশের রাজধানী কোথা?"
inputs = tokenizer.encode(prompt, return_tensors="pt")

outputs = model.generate(
    inputs,
    max_length=256,
    do_sample=True,
    temperature=0.7,
    pad_token_id=tokenizer.pad_token_id,
)

result = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(result)

🔧 Preserved Thinking

In multi-turn conversations, the model can preserve thinking context:

# Conversation with preserved thinking
conversation = [
    {"role": "user", "content": "কম্পিউটার কী?"},
    {"role": "assistant", "content": "<thinking>...reasoning...</thinking>বৈদ্যুতিক যন্ত্র...</final>"},
    {"role": "user", "content": "এর উদাহরণ দাও"},
    # Model remembers previous context
]

🎯 Turn-level Thinking Control

Enable/disable thinking per request:

# With thinking (complex tasks)
outputs = model.generate(inputs, use_thinking=True)

# Without thinking (simple tasks - faster)
outputs = model.generate(inputs, use_thinking=False)

📊 Performance

Metric	Value
Parameters	23M
Thinking Tokens	6
Max Length	256

🚀 Training

To train with thinking data:

python3 sheikh_thinking_finetune.py --train

📁 Files

config.json - Model configuration with thinking token IDs
tokenizer.json - BPE tokenizer with thinking tokens
chat_template.jinja - Chat template supporting thinking blocks
model.safetensors - Model weights

⚠️ Limitations

Small model (23M parameters)
Training data limited
Thinking capability basic
Requires fine-tuning for production use

📝 Citation

@misc{sheikh-bangla-thinking,
  author = {Osama Bin Likhon},
  title = {sheikh-bangla-thinking: Bengali GPT-2 with Interleaved Thinking},
  url = {https://huggingface.co/OsamaBinLikhon/sheikh-bangla-thinking},
}

🤝 Related Models

sheikh-bangla-110m - Base model without thinking
sheikh-bangla-tokenizer-thinking - Tokenizer with thinking tokens

Downloads last month: -

Safetensors

Model size

23.1M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

OsamaBinLikhon
/

sheikh-bangla-thinking