Model Card for LFT-LoRA
LoRA-finetuned meta-llama/Llama-3.1-8B-Instruct on a structured math reasoning corpus with explicit chain-of-thought, topic coverage signals, and representation hints. The repo contains merged weights (no separate adapter load required) plus a Llama-3.1-style chat template.
Model Details
- Developed by: Yandagandi Ratna Sai Aakanksha
- Model type: Causal decoder-only LM (Llama 3.1, 8B) with merged LoRA adapters
- Languages: English
- License: Meta Llama 3.1 Community License (inherits from the base model)
- Finetuned from:
meta-llama/Llama-3.1-8B-Instruct - Context length: 131k (RoPE scaling retained from base)
- Tokenization: Llama 3.1 tokenizer; bundled
chat_template.jinjafor chat formatting
Model Sources
- Repository:
llama3.1-8b-lft-loraon the Hugging Face Hub (replace with your handle, e.g.,Sashank-810/llama3.1-8b-lft-lora) - Demo: Bring-your-own UI (Transformers / vLLM snippets below)
Uses
Direct Use
- Structured math tutoring with concise chain-of-thought and final boxed answers.
- Topic coverage checkpoints via
covered_nodes,used_edges,operations(informative, not required at inference).
Downstream Use
- Further task-specific finetunes on math or STEM reasoning.
- Few-shot prompting for solution verification or step planning.
Out-of-Scope Use
- Safety-critical decisions, factual QA outside math curriculum, or content moderation.
- Non-English tasks (not tuned or evaluated).
Bias, Risks, and Limitations
- Finetuned only on math-style prompts; may produce confident but incorrect factual statements outside domain.
- Not safety-tuned; add an external guardrail layer for production deployments.
- Long-context supported by base, but training examples were short-to-mid length (~1.5k tokens packed).
Recommendations
- Keep system prompts concise; the chat template already prepends metadata headers.
- Validate answers with external checks for high-stakes use.
How to Get Started
Use with Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Sashank-810/llama3.1-8b-lft-lora" # replace with your handle
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "You are a math tutor. Show compact reasoning, then the final answer."},
{"role": "user", "content": "How many distinct 7-letter words can be made from BANANA?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Use with vLLM
from vllm import LLM, SamplingParams
model_id = "Sashank-810/llama3.1-8b-lft-lora"
llm = LLM(model=model_id, tokenizer=model_id, trust_remote_code=True)
messages = [
{"role": "system", "content": "You are a math tutor. Show compact reasoning, then the final answer."},
{"role": "user", "content": "Explain the break-even quantity when cost=40000+12x and revenue=25x."},
]
prompt = llm.get_tokenizer().apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampling = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
out = llm.generate([prompt], sampling_params=sampling)
print(out[0].outputs[0].text)
Run the OpenAI-compatible server:
python -m vllm.entrypoints.openai.api_server \
--model Sashank-810/llama3.1-8b-lft-lora \
--trust-remote-code \
--max-model-len 1536
Training Details
Training Data
- Private structured math reasoning set with fields:
topic,question,chain_of_thought_reasoning,final_answer,covered_nodes,used_edges,operations,repr_path. - Examples include algebra, permutations/combinations, cost-volume-profit, and proof sketches; chain-of-thought is supervised.
Speeds, Sizes, Times
- Base: 8B parameters; LoRA merged into full weights (no adapter load).
- Checkpoint shards: 4x
.safetensorsplusmodel.safetensors.index.json.
Evaluation
- Qualitative spot checks on held-out quiz-style math questions (multiple choice and open-form). No formal benchmark scores reported here.
- Recommended to evaluate on your target math set before deployment.
Environmental Impact
- Hardware/training run details not logged; expect footprint similar to two-epoch QLoRA on 8B base with sequence packing.
Technical Specifications
Model Architecture and Objective
- Llama 3.1 8B decoder-only causal LM, RoPE-scaled (factor 8) with 131k context; merged LoRA on attention and MLP projections; causal LM objective with right-padding.
Compute Infrastructure
- Training code: PyTorch + Hugging Face Transformers + PEFT; flash-attention v2 when available.
Citation
If you use this model, please cite the base model and this finetune.
@misc{lft-lora-2025,
title = {LFT-LoRA: Structured Math Reasoning LoRA on Llama-3.1-8B-Instruct},
year = {2025},
note = {Finetuned from meta-llama/Llama-3.1-8B-Instruct},
}
Model Card Contact
- For questions or issues, please open a Hub discussion or reach out via the repository owner on Hugging Face.
- Downloads last month
- 8
Model tree for Sashank-810/llama3.1-8b-lft-lora
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct