Model Card for babyLlama-TinyStories
The goal for training this model is to see if a tiny llm can generate coherent text/stories when trained a tiny subset of TinyStories, in this case just 1% of the total training data. I was able to locally train a small LLM, babyLlama based on Llama2 architecture, with 0.77M parameters and found it was able to generate coherent stories to an extent.
Limitations
- This model was trained with a context length of 256
- This model uses a custom tokenizer specifically trained on TinyStories with a vocab size of 8192
Quick start
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("shamith/babyLlama-TinyStories")
model = AutoModelForCausalLM.from_pretrained("shamith/babyLlama-TinyStories", trust_remote_code=True)
prompt = "Once upon a time"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=128, do_sample=True, temperature=0.6, top_k=60, repetition_penalty=1.1)
output = tokenizer.decode(output[0], skip_special_tokens=True)
print(output)
# Output 1: Once upon a time, there was a little girl named Lily. She loved playing outside in the sunshine and picking flowers. One day, her mommy asked her to pick some flowers on the ground. Lily was so excited to see what she had found. "Mommy, look at the flowers!" she said. Her mommy smiled and said, "That's a pretty flower, Lily." Later that day, Lily's mom took her to the park. It was bright and colourful and had a lovely day at the park. Lily was happy to have her favorite spot for her mommy. From that day on, Lily and her mommy always became good friends. They
# Output 2: Once upon a time, there was a little boy named Timmy. He loved to play with his toys and eat candy every day. One day, Timmy's friend Billy came over to play. Timmy saw that he really wanted to play with the candy too. Timmy said, "I want to play with you, Billy!" Billy didn't want to share the candy with Timmy. They both argued and fought, but Timmy wasn't sure what to do. Suddenly, the big candy started to move and Timmy was scared. He tried to grab the candy, but it was too late. Timmy had no idea what to do. He knew he would never
# Output 3: Once upon a time, there was a little girl named Lily. She loved to play outside in the sky and explore the world around her. One day she went outside and saw something shiny on the ground. It was a big, green stone! Lily thought it looked different like a butterfly, but she couldn't believe it. She looked around and saw that it had fallen from the ground. After a while, Lily's mom saw her crying and said, "Don't worry, I can help you find your crystal." Lily put the crystal back in the tree and gave it some more pretty things for her to bring. The girl smiled and said
Training procedure
- revision: 0.8M
- Trained on M1 Max iGPU for 6 epochs
- Trained on 4,491,878 tokens, Validated on 44,357 tokens. Each training epoch took around 125 minutes on average
- [0.77M params, dtype: float32, batch size: 64]: The training takes about 5 GB of memory
Framework versions
- torch: 2.8.0
- transformers: 4.57.0.dev0
- datasets: 4.0.0
- tokenizers: 0.22.0
- sentencepiece: 0.2.1
- Downloads last month
- -