Jenny: Generation Alpha Slang Translator

A lightweight, sharable repository for Jenny — a LoRA-tuned TinyLlama model that translates current Gen Alpha slang into plain English. The repo includes data, configs, evaluation notes, and usage instructions for employers or collaborators to reproduce results.

Repository Layout

data/jenny_genA_corpus.csv — 708 paired slang → plain English examples (synthetic, collected via ChatGPT Deep Research from slang glossaries and public social content).
configs/gena_to_plain.yaml — lm-eval task card for the custom translation benchmark.
scripts/finetune_lora.py — optional helper script that fine-tunes TinyLlama-1.1B-Chat with a LoRA adapter for Gen Alpha slang translation
outputs/ — recommended location for checkpoints and eval artifacts (created on first run).
(Optional) outputs/jenny_lora_adapter/ — place to save/export the LoRA adapter for Hugging Face Hub upload.
README.md — this document; serves as the public-facing model card + quickstart.

1) Introduction

Problem: Gen Alpha slang evolves rapidly; even small age gaps produce misunderstandings. Base LLMs often translate literally or hallucinate meanings, which blocks intergenerational communication and trust & safety review. Grandparents can feel alienated in a world saturated with digital slang; teenagers can feel misunderstood in a world framed by analog assumptions.
Approach: Parameter-efficient LoRA fine-tuning of TinyLlama-1.1B-Chat on 708 slang→plain pairs to inject contemporary slang knowledge while preserving general reasoning.
Result: BLEU/ROUGE-L on the held-out slang test split rose from 5.1/0.28 to 31.8/0.52. ARC-Easy, HellaSwag, and Winogrande moved only within standard error, indicating specialization without catastrophic forgetting.
Why TinyLlama + LoRA: Small footprint (fits consumer GPUs), fast iteration, adapter modularity for future slang refreshes, and reduced risk of overwriting base capabilities.

2) Data

Source & composition: 708 paired slang sentences with human-written plain English translations, synthesized from online slang glossaries and social content using ChatGPT Deep Research. Frequent, current terms such as “bussin’,” “snatched,” “no cap,” “rizz,” and “drip” are included. Columns: gena_slang, plain_english.
Cleaning: trimmed whitespace, normalized casing/punctuation, deduped exact pairs.
Splits: 90/10 train/test via train_test_split(test_size=0.1, seed=42). A further 10% of the train split served as a dev set during hyperparameter sweeps. Optional export of the test split to data/gena_test.jsonl (matching configs/gena_to_plain.yaml) for lm-eval.
License & provenance: synthetic/derivative content; review for downstream compliance before redistribution. Intended for research/educational use.
Example rows (slang → plain):
- “This burger is bussin', no cap.” → “This burger is delicious, honestly.”
- “Your drip is on point today.” → “Your outfit looks great today.”
- “She ate and left no crumbs.” → “She performed extremely well.”
Schema expectations: CSV with headers gena_slang,plain_english; UTF-8; no quoted newlines. If you use another schema, rename columns or preprocess before training.
Known gaps: Limited coverage of regional slang (AAVE variants, UK roadman terms), sparse multi-sentence context, and minimal emoji-heavy samples.

3) Methodology

Finetuning choice: Parameter-efficient LoRA on TinyLlama-1.1B-Chat to retain general reasoning while specializing on slang; RAG was skipped because the task is stylistic, not retrieval-heavy.
Prompting: Chat template with a translation system instruction and user slang string; labels masked so loss only covers the assistant span.
Hyperparameters (best run):
- LoRA: r=16, alpha=32, dropout=0.05, targets q_proj,k_proj,v_proj,o_proj.
- Training: 3 epochs; effective batch 16 (per-device 4, grad-accum 4); lr 2e-4; weight decay 0.01; optimizer paged_adamw_8bit.
- Seq length: 256; generation max_new_tokens=64, greedy (temperature=0).
- Precision: bf16 when supported, otherwise fp16/fp32 fallback.
Repro scripts: You can reuse your existing finetune_genalpha_translator.py or a LoRA-only helper in scripts/; no code changes are required here.
Compute used: Single A100 40GB (or consumer GPU with ≥16GB works with smaller batch/grad-accum); training wallclock ≈ 35–45 minutes at the above settings.
Validation: Dev split monitored with BLEU/ROUGE; early stopping not used because the small dataset converged within 3 epochs.
Ablations tried: LoRA ranks {8,16,32}, alphas {16,32,64}, dropout {0,0.05,0.1}, attention-only vs attention+MLP targets, LRs {1e-4,2e-4,3e-4}, epochs {2,3,4}. Best balance: r=16, alpha=32, dropout=0.05, attention-only, lr 2e-4, 3 epochs.
Loss masking: Only the assistant span contributes to loss; system/user tokens are masked to avoid prompt overfitting.
Decoding for eval: Greedy (temperature 0, top_p 1.0), max_new_tokens 64; no repetition penalty applied.

4) Evaluation

Benchmarks: custom gena_to_plain (BLEU / ROUGE-L on held-out slang test set) plus general reasoning tasks (HellaSwag acc_norm, ARC-Easy acc_norm, Winogrande acc). Scores use greedy decoding (temperature 0, max_new_tokens 64).

Model	HellaSwag acc_norm	ARC-Easy acc_norm	Winogrande acc	gena_to_plain BLEU / ROUGE-L
TinyLlama-1.1B-Chat (base)	0.605	0.545	0.599	5.1 / 0.28
Jenny LoRA (TinyLlama + LoRA)	0.600	0.559	0.590	31.8 / 0.52
Phi-3-mini-4k-instruct (≈3.8B)	0.642	0.603	0.617	18.3 / 0.39
Llama-3.2-1B-Instruct	0.571	0.514	0.562	12.4 / 0.33

Summary: LoRA provides the largest gain on slang translation with negligible drift on general tasks. Phi-3-mini is stronger overall but less domain-specialized; the tuned TinyLlama adapter remains best on slang fidelity at small scale.

How to reproduce the table:

Train: run the quickstart fine-tune command (below) to produce outputs/jenny_lora_adapter.
Export test set: python scripts/finetune_lora.py --corpus_path data/jenny_genA_corpus.csv --save_eval_jsonl --eval_split 0.1 --eval_only.
Evaluate: use lm-eval with --include_path configs and the adapter path in --model_args peft=....
For comparison rows, swap pretrained=Phi-3-mini-4k-instruct or pretrained=meta-llama/Llama-3.2-1B-Instruct with the same decoding settings.
Metrics reported are point estimates; standard errors available in lm-eval JSON outputs (*_stderr fields). Run multiple seeds if you need confidence intervals.
If running on CPU, increase --batch_size cautiously; expect much slower lm-eval.

5) Usage and Intended Uses

Intended for parents, educators, trust & safety analysts, and researchers who need faithful translations of modern Gen Alpha slang to plain English. Not intended to generate new slang or bypass platform policies; apply standard safety layers in downstream use.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_id = "your-username/jenny-gena-slang-lora"  # replace with your Hub repo

tokenizer = AutoTokenizer.from_pretrained(base_id)
model = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype="auto")
model = PeftModel.from_pretrained(model, adapter_id)

def translate_slang(text: str) -> str:
    messages = [
        {"role": "system", "content": "Translate the following Generation Alpha slang sentence to plain English."},
        {"role": "user", "content": text},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    outputs = model.generate(**tokenizer(prompt, return_tensors="pt").to(model.device), max_new_tokens=64)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

print(translate_slang("This burger is bussin', no cap."))

Out-of-scope / safety notes

Do not rely on Jenny for content moderation or harm detection; add your own safety stack.
Not suitable for medical/legal advice or for generating slang; it is a translator only.
If translating sensitive or personal data, ensure downstream storage and logging comply with your privacy requirements.
Add profanity filters if deploying in user-facing contexts; the model will faithfully translate slang that may contain explicit content.

6) Prompt Format

Chat-style prompt with a translation-focused system instruction and user slang:

<|system|> Translate the following Generation Alpha slang sentence to plain English.<|user|> I'm lowkey vibing rn, no cap.<|assistant|>

The gena_to_plain lm-eval task uses the same format (system + user + assistant generation).
Keep prompts short (<200 tokens) to avoid truncation at max_length=256. Reduce max_new_tokens for tighter latency.

7) Expected Output Format

Short plain-English sentence with no slang, emoji, or extra chatter:

I'm really enjoying this right now, honestly.

Multi-sentence responses are rare by design; if you see rambling, lower max_new_tokens or add a brevity instruction to the system message.

8) Limitations

Dataset is small (708) and synthetic; coverage and pragmatic nuance are limited, and slang evolves quickly.
Regional, cultural, and code-mixed slang may be mistranslated; intensity/emphasis can be softened.
No built-in safety filtering; wrap with your policy stack for production.
Evaluations assume clean text; OCR noise or heavy stylization will reduce accuracy. Regular refreshes with newer slang and human review are recommended.
LoRA capacity can miss rare pragmatic cues; consider refreshing adapters quarterly as slang shifts.
Emoji-heavy inputs (e.g., “fr 😭😭”) are sparsely represented; expect occasional misses.
Outputs may slightly formalize the tone; if you need tone-preserving paraphrases, retrain with stylistic targets.

Quickstart (reproduce training/eval without code changes)

Install deps (Python ≥3.10):

pip install transformers datasets peft accelerate bitsandbytes lm-eval

Fine-tune (example using your existing script):

python finetune_genalpha_translator.py \\
  --corpus_path data/jenny_genA_corpus.csv \\
  --output_dir outputs \\
  --model_name TinyLlama/TinyLlama-1.1B-Chat-v1.0 \\
  --num_train_epochs 3 --learning_rate 2e-4 \\
  --per_device_train_batch_size 4 --gradient_accumulation_steps 4 \\
  --max_source_length 128 --max_target_length 128

Export test JSONL for lm-eval (optional):

python scripts/finetune_lora.py --corpus_path data/jenny_genA_corpus.csv --save_eval_jsonl --eval_split 0.1 --eval_only

Run lm-eval with the custom task card:

lm-eval --model hf \\
  --model_args pretrained=TinyLlama/TinyLlama-1.1B-Chat-v1.0,dtype=bfloat16,peft=outputs/jenny_lora_adapter \\
  --tasks hellaswag,arc_easy,winogrande,gena_to_plain \\
  --include_path configs --batch_size auto

If you lack GPU RAM, lower --per_device_train_batch_size and raise --gradient_accumulation_steps; or switch to CPU-only lm-eval with --device cpu (slow).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ars4eh/jenny

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Finetuned

(473)

this model