CooperLM-354M / README.md
mehta's picture
Update README.md
1cbe431 verified
metadata
license: mit
language:
  - en
metrics:
  - perplexity
base_model:
  - openai-community/gpt2
pipeline_tag: text-generation
library_name: transformers
tags:
  - gpt2
  - toy-llm
  - from-scratch
  - huggingface
  - transformers
  - english
  - casual-lm
  - educational

🧠 CooperLM-354M

CooperLM-354M is a 354 million parameter GPT-2 based language model trained from scratch on a filtered subset of Wikipedia, BookCorpus, and OpenWebText. It was created as a toy project to explore end-to-end LLM training using Hugging Face’s Transformers and Datasets libraries.

Github Repo: https://github.com/daniel-mehta/CooperLM-354M

🧱 Architecture

  • GPT-2 with 24 layers, 16 heads, 1024 hidden size
  • 256-token context window
  • Trained for 1 epoch on 100k samples (~1.2M sequences)

📊 Training Details

Setting Value
Model Type GPT2LMHeadModel
Epochs 1
Precision fp16
Batch Size (effective) 16
GPU RTX 4060
Final Eval Loss 5.63
Perplexity ~263

📥 Usage

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

model = GPT2LMHeadModel.from_pretrained("mehta/CooperLM-354M")
tokenizer = GPT2TokenizerFast.from_pretrained("mehta/CooperLM-354M")

prompt = "In a distant future,"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.9, top_p=0.95)

print(tokenizer.decode(output[0], skip_special_tokens=True))

📝License

MIT