auto-g-nano
This is a minimal, decoder-only Transformer (nanoGPT-style) trained from scratch on the Tiny Shakespeare dataset.
Model Details
- Architecture: Decoder-only Transformer
- Parameters: ~10.8M
- Vocabulary Size: 65
- Embedding Dimension: 384
- Heads: 6
- Layers: 6
- Block Size: 256
How to Use
You can use this model directly with the GPT class from this repository.
from model import GPT
model = GPT.from_pretrained("geoffsee/auto-g-nano")
# Generate text
# context = torch.zeros((1, 1), dtype=torch.long)
# print(model.generate(context, max_new_tokens=100))
Training Data
Trained on the Tiny Shakespeare dataset.
- Downloads last month
- 14