Lark‑70M‑v1
Model Summary
Lark‑70M‑v1 is a small, experimental language model fine‑tuned from EleutherAI’s Pythia‑70M‑deduped base.
It is trained to emphasize reasoning clarity, social cognition, and self‑correction behavior rather than raw instruction following or verbosity.
The model is intended as a behavioral probe and research artifact rather than a production assistant.
Model Details
- Developed by: Independent researcher (ahmiershadowman)
- Model type: Causal language model
- Language(s): English
- License: Apache‑2.0
- Finetuned from: EleutherAI/pythia‑70m‑deduped‑v0
Training Data
The model was fine‑tuned on a curated mixture of datasets emphasizing reasoning and reflection:
Claude Haiku High‑Reasoning (1700x)
Short, high‑quality conversational reasoning examples.Theory of Mind Dataset
Instruction‑style examples focused on social reasoning and perspective‑taking.Natural Reasoning (Facebook)
Long‑form reasoning text, subsampled to balance dataset influence.KCA Data (Training Subset)
Self‑correction and revision‑oriented examples.
Only the training source files were used; evaluation artifacts and metrics were excluded from training.
All datasets were normalized into a single text format and filtered to remove empty or malformed entries.
Preprocessing
- Removal of
<think>...</think>tags to avoid explicit chain‑of‑thought leakage. - All examples flattened into plain text.
- Maximum sequence length: 512 tokens
- Tokenization performed using the base model tokenizer.
Training Procedure
- Framework: Hugging Face Transformers + Trainer
- Precision: fp16 mixed precision
- Training regime: Supervised fine‑tuning
- Learning rate: 8e‑6
- Warmup steps: 800
- Max steps: 6000
- Batching: Data‑parallel training with gradient accumulation
- Hardware: TPU (data parallelism via Accelerate)
The training schedule intentionally favors slow learning and delayed crystallization to preserve behavioral flexibility.
Evaluation
Evaluation Data
Evaluation was performed on KCA examination input prompts (test split only).
These examples were not used during training.
Metrics
- Evaluation loss (cross‑entropy) tracked periodically during training.
- No task‑specific accuracy metrics were computed.
Notes on Evaluation
Due to the small model size (70M parameters), evaluation loss is expected to be noisy.
Qualitative inspection of generated outputs is recommended for assessing behavioral changes.
Intended Uses
Direct Use
- Research into small‑model reasoning behavior
- Studying self‑correction and hesitation dynamics
- Prompt‑level behavioral probing
Downstream Use
- Further fine‑tuning or experimentation
- Educational or exploratory research
Out‑of‑Scope Use
- High‑stakes decision making
- Safety‑critical applications
- Deployment as a general‑purpose assistant
Bias, Risks, and Limitations
- The model inherits biases present in its training data.
- Small parameter count limits factual reliability and robustness.
- Reasoning behavior is emergent and inconsistent.
- Outputs should not be treated as authoritative.
Environmental Impact
- Hardware: TPU (cloud‑host)
Model tree for ahmiershadowman/Lark-70m-v1
Base model
EleutherAI/pythia-70m-deduped-v0