chakma_sagorsarker_bangla_bert
This model is a fine-tuned version of sagorsarker/bangla-bert-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 4.0853
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 18
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 6.0342 | 1.0 | 42 | 5.4936 |
| 5.2244 | 2.0 | 84 | 4.8459 |
| 4.8932 | 3.0 | 126 | 4.7841 |
| 4.5737 | 4.0 | 168 | 4.8512 |
| 4.4117 | 5.0 | 210 | 4.4402 |
| 4.2887 | 6.0 | 252 | 4.5596 |
| 4.1301 | 7.0 | 294 | 4.5135 |
| 4.0166 | 8.0 | 336 | 4.3542 |
| 3.948 | 9.0 | 378 | 4.3495 |
| 3.8748 | 10.0 | 420 | 4.1867 |
| 3.8486 | 11.0 | 462 | 4.1073 |
| 3.7527 | 12.0 | 504 | 4.1980 |
| 3.695 | 13.0 | 546 | 4.2188 |
| 3.6467 | 14.0 | 588 | 4.2136 |
| 3.632 | 15.0 | 630 | 3.9560 |
| 3.6019 | 16.0 | 672 | 3.9171 |
| 3.5766 | 17.0 | 714 | 4.1435 |
| 3.638 | 18.0 | 756 | 4.0853 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 2
Model tree for adity12345/chakma_sagorsarker_bangla_bert
Base model
sagorsarker/bangla-bert-base