CooperLM-354M / README.md

mehta

Update README.md

1cbe431 verified 3 months ago

preview code

raw

history blame contribute delete

1.59 kB

metadata

license: mit
language:
  - en
metrics:
  - perplexity
base_model:
  - openai-community/gpt2
pipeline_tag: text-generation
library_name: transformers
tags:
  - gpt2
  - toy-llm
  - from-scratch
  - huggingface
  - transformers
  - english
  - casual-lm
  - educational

🧠 CooperLM-354M

CooperLM-354M is a 354 million parameter GPT-2 based language model trained from scratch on a filtered subset of Wikipedia, BookCorpus, and OpenWebText. It was created as a toy project to explore end-to-end LLM training using Hugging Face’s Transformers and Datasets libraries.

Github Repo: https://github.com/daniel-mehta/CooperLM-354M

🧱 Architecture

GPT-2 with 24 layers, 16 heads, 1024 hidden size
256-token context window
Trained for 1 epoch on 100k samples (~1.2M sequences)

📊 Training Details

Setting	Value
Model Type	GPT2LMHeadModel
Epochs	1
Precision	fp16
Batch Size (effective)	16
GPU	RTX 4060
Final Eval Loss	5.63
Perplexity	~263

📥 Usage

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

model = GPT2LMHeadModel.from_pretrained("mehta/CooperLM-354M")
tokenizer = GPT2TokenizerFast.from_pretrained("mehta/CooperLM-354M")

prompt = "In a distant future,"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.9, top_p=0.95)

print(tokenizer.decode(output[0], skip_special_tokens=True))

📝License

MIT