HumanV (Transformers Integration) + Nilla-Story Checkpoint

This repository contains:

  • HumanV: a lightweight, decoder-only Transformer architecture integrated into the πŸ€— Transformers codebase.
  • Nilla-Story: a small HumanV checkpoint trained for short story generation (TinyStories-style).

Goal: upstream the HumanV architecture into huggingface/transformers so it can be loaded with standard AutoModel* classes (without trust_remote_code=True).


Model: Nilla-Story

  • Hub: nebularesearchtrain/nilla-story
  • Tokenizer: GPT-2 tokenizer (gpt2), vocab size 50,257
  • Context length: 1,024 (trained with sequence length 512)

Quickstart (from the Hub)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "nebularesearchtrain/nilla-story"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Once upon a time,"
inputs = tokenizer(prompt, return_tensors="pt")

out = model.generate(
    **inputs,
    max_new_tokens=120,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

If you are using a development version that still requires custom code on the Hub, load with:

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

Architecture: HumanV

HumanV is a decoder-only Transformer inspired by modern LLaMA-style blocks:

  • Causal self-attention with Rotary Position Embeddings (RoPE)
  • RMSNorm
  • SiLU / SwiGLU-style MLP
  • Optional grouped-query attention via num_key_value_heads (can be equal to num_attention_heads for standard MHA)

Precision policy (recommended)

For TPU-friendly stability and speed:

  • BF16 for most matmul operations
  • FP32 for numerically sensitive steps (attention softmax + attention mask add, RMSNorm, logits/loss)

Training (Nilla-Story)

  • Dataset: TinyStories (subset)
  • Sequence length: 512
  • Precision: BF16 (with FP32 softmax/norm/loss as described above)
  • Hardware: Google TPU v5e-1

Example generation (sample)

Prompts like:

  • Once upon a time,
  • The little bird wanted to

produce short story continuations suitable for toy storytelling tasks.


Contributing / Upstreaming to Transformers

This repository is prepared for an upstream PR to πŸ€— Transformers. A typical PR includes:

  • src/transformers/models/humanv/ implementation (configuration_*.py, modeling_*.py)
  • Auto-class registration (so AutoModelForCausalLM works)
  • Unit tests in tests/models/humanv/
  • Documentation page: docs/source/en/model_doc/humanv.md

Transformers recommends a modular approach for new model contributions, and CI may validate generated files when using modular modeling.


Limitations

  • This is a small model trained on a limited dataset. It may repeat phrases, hallucinate details, or generate simplistic stories.
  • Not intended for safety-critical use cases.

License

  • Code: Apache-2.0 (compatible with πŸ€— Transformers)

Citation

If you use this work, please cite the repository and the Hugging Face model page.

Downloads last month
19
Safetensors
Model size
19.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train humanvprojectceo/nilla-story

Space using humanvprojectceo/nilla-story 1