HumanV (Transformers Integration) + Nilla-Story Checkpoint

This repository contains:

  • HumanV: a lightweight, decoder-only Transformer architecture integrated into the ๐Ÿค— Transformers codebase.
  • Nilla-Story: a small HumanV checkpoint trained for short story generation (TinyStories-style).

Goal: upstream the HumanV architecture into huggingface/transformers so it can be loaded with standard AutoModel* classes (without trust_remote_code=True).


Model: Nilla-Story

  • Hub: nebularesearchtrain/nilla-story
  • Tokenizer: GPT-2 tokenizer (gpt2), vocab size 50,257
  • Context length: 1,024 (trained with sequence length 512)

Quickstart (from the Hub)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "nebularesearchtrain/nilla-story"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Once upon a time,"
inputs = tokenizer(prompt, return_tensors="pt")

out = model.generate(
    **inputs,
    max_new_tokens=120,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

If you are using a development version that still requires custom code on the Hub, load with:

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

Architecture: HumanV

HumanV is a decoder-only Transformer inspired by modern LLaMA-style blocks:

  • Causal self-attention with Rotary Position Embeddings (RoPE)
  • RMSNorm
  • SiLU / SwiGLU-style MLP
  • Optional grouped-query attention via num_key_value_heads (can be equal to num_attention_heads for standard MHA)

Precision policy (recommended)

For TPU-friendly stability and speed:

  • BF16 for most matmul operations
  • FP32 for numerically sensitive steps (attention softmax + attention mask add, RMSNorm, logits/loss)

Training (Nilla-Story)

  • Dataset: TinyStories (subset)
  • Sequence length: 512
  • Precision: BF16 (with FP32 softmax/norm/loss as described above)
  • Hardware: Google TPU v5e-1

Example generation (sample)

Prompts like:

  • Once upon a time,
  • The little bird wanted to

produce short story continuations suitable for toy storytelling tasks.


Contributing / Upstreaming to Transformers

This repository is prepared for an upstream PR to ๐Ÿค— Transformers. A typical PR includes:

  • src/transformers/models/humanv/ implementation (configuration_*.py, modeling_*.py)
  • Auto-class registration (so AutoModelForCausalLM works)
  • Unit tests in tests/models/humanv/
  • Documentation page: docs/source/en/model_doc/humanv.md

Transformers recommends a modular approach for new model contributions, and CI may validate generated files when using modular modeling.


Limitations

  • This is a small model trained on a limited dataset. It may repeat phrases, hallucinate details, or generate simplistic stories.
  • Not intended for safety-critical use cases.

License

  • Code: Apache-2.0 (compatible with ๐Ÿค— Transformers)

Citation

If you use this work, please cite the repository and the Hugging Face model page.

Downloads last month
3
Safetensors
Model size
19.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train humanvprojectceo/nilla-story

Space using humanvprojectceo/nilla-story 1