sheikh-bangla-110m

A Bengali GPT-2 language model trained from scratch with custom BPE tokenizer.

Model Details

  • Model Type: GPT-2 (Causal Language Model)
  • Language: Bengali (বাংলা)
  • Training: Trained from scratch (no pretrained weights)
  • Parameters: 23,142,400 (110M approx)
  • Architecture: 6 layers, 8 attention heads, 512 hidden size

Tokenizer

  • Type: BPE (Byte Pair Encoding)
  • Vocabulary Size: 8,000 tokens
  • Special Tokens:
    • [PAD] - Padding token (ID: 0)
    • [UNK] - Unknown token (ID: 1)
    • [CLS] - Beginning of sentence (ID: 2)
    • [SEP] - End of sentence (ID: 3)
    • [MASK] - Mask token (ID: 4)

Usage

from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("OsamaBinLikhon/sheikh-bangla-110m")
tokenizer = PreTrainedTokenizerFast.from_pretrained("OsamaBinLikhon/sheikh-bangla-110m")

# Generate text
input_text = "বাংলা ভাষা হলো"
inputs = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(inputs, max_length=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

  • Training Data: Bengali text corpus
  • Epochs: 20
  • Batch Size: 4
  • Learning Rate: 0.001
  • Max Length: 256 tokens
  • Optimizer: AdamW

Performance

The model shows progressive loss reduction during training:

  • Initial Loss: ~8.08
  • Final Loss: ~6.69 (after 20 epochs)

Limitations

  • This is a small model trained on limited data
  • May produce repetitive or nonsensical text for complex prompts
  • Not suitable for production use without fine-tuning
  • Training was done on CPU with limited computational resources

Citation

@misc{sheikh-bangla-110m,
  author = {Osama Bin Likhon},
  title = {sheikh-bangla-110m: Bengali GPT-2 Model Trained from Scratch},
  url = {https://huggingface.co/OsamaBinLikhon/sheikh-bangla-110m},
}

Acknowledgments

  • Hugging Face for the Transformers library
  • The tokenizers library for BPE implementation
Downloads last month
23
Safetensors
Model size
23.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using OsamaBinLikhon/sheikh-bangla-110m 1