HumanV (Transformers Integration) + Nilla-Story Checkpoint
This repository contains:
- HumanV: a lightweight, decoder-only Transformer architecture integrated into the π€ Transformers codebase.
- Nilla-Story: a small HumanV checkpoint trained for short story generation (TinyStories-style).
Goal: upstream the HumanV architecture into
huggingface/transformersso it can be loaded with standardAutoModel*classes (withouttrust_remote_code=True).
Model: Nilla-Story
- Hub:
nebularesearchtrain/nilla-story - Tokenizer: GPT-2 tokenizer (
gpt2), vocab size 50,257 - Context length: 1,024 (trained with sequence length 512)
Quickstart (from the Hub)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "nebularesearchtrain/nilla-story"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = "Once upon a time,"
inputs = tokenizer(prompt, return_tensors="pt")
out = model.generate(
**inputs,
max_new_tokens=120,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
If you are using a development version that still requires custom code on the Hub, load with:
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
Architecture: HumanV
HumanV is a decoder-only Transformer inspired by modern LLaMA-style blocks:
- Causal self-attention with Rotary Position Embeddings (RoPE)
- RMSNorm
- SiLU / SwiGLU-style MLP
- Optional grouped-query attention via
num_key_value_heads(can be equal tonum_attention_headsfor standard MHA)
Precision policy (recommended)
For TPU-friendly stability and speed:
- BF16 for most matmul operations
- FP32 for numerically sensitive steps (attention softmax + attention mask add, RMSNorm, logits/loss)
Training (Nilla-Story)
- Dataset: TinyStories (subset)
- Sequence length: 512
- Precision: BF16 (with FP32 softmax/norm/loss as described above)
- Hardware: Google TPU v5e-1
Example generation (sample)
Prompts like:
Once upon a time,The little bird wanted to
produce short story continuations suitable for toy storytelling tasks.
Contributing / Upstreaming to Transformers
This repository is prepared for an upstream PR to π€ Transformers. A typical PR includes:
src/transformers/models/humanv/implementation (configuration_*.py,modeling_*.py)- Auto-class registration (so
AutoModelForCausalLMworks) - Unit tests in
tests/models/humanv/ - Documentation page:
docs/source/en/model_doc/humanv.md
Transformers recommends a modular approach for new model contributions, and CI may validate generated files when using modular modeling.
Limitations
- This is a small model trained on a limited dataset. It may repeat phrases, hallucinate details, or generate simplistic stories.
- Not intended for safety-critical use cases.
License
- Code: Apache-2.0 (compatible with π€ Transformers)
Citation
If you use this work, please cite the repository and the Hugging Face model page.
- Downloads last month
- 19