Model Card for LFT-LoRA

LoRA-finetuned meta-llama/Llama-3.1-8B-Instruct on a structured math reasoning corpus with explicit chain-of-thought, topic coverage signals, and representation hints. The repo contains merged weights (no separate adapter load required) plus a Llama-3.1-style chat template.

Model Details

  • Developed by: Yandagandi Ratna Sai Aakanksha
  • Model type: Causal decoder-only LM (Llama 3.1, 8B) with merged LoRA adapters
  • Languages: English
  • License: Meta Llama 3.1 Community License (inherits from the base model)
  • Finetuned from: meta-llama/Llama-3.1-8B-Instruct
  • Context length: 131k (RoPE scaling retained from base)
  • Tokenization: Llama 3.1 tokenizer; bundled chat_template.jinja for chat formatting

Model Sources

  • Repository: llama3.1-8b-lft-lora on the Hugging Face Hub (replace with your handle, e.g., Sashank-810/llama3.1-8b-lft-lora)
  • Demo: Bring-your-own UI (Transformers / vLLM snippets below)

Uses

Direct Use

  • Structured math tutoring with concise chain-of-thought and final boxed answers.
  • Topic coverage checkpoints via covered_nodes, used_edges, operations (informative, not required at inference).

Downstream Use

  • Further task-specific finetunes on math or STEM reasoning.
  • Few-shot prompting for solution verification or step planning.

Out-of-Scope Use

  • Safety-critical decisions, factual QA outside math curriculum, or content moderation.
  • Non-English tasks (not tuned or evaluated).

Bias, Risks, and Limitations

  • Finetuned only on math-style prompts; may produce confident but incorrect factual statements outside domain.
  • Not safety-tuned; add an external guardrail layer for production deployments.
  • Long-context supported by base, but training examples were short-to-mid length (~1.5k tokens packed).

Recommendations

  • Keep system prompts concise; the chat template already prepends metadata headers.
  • Validate answers with external checks for high-stakes use.

How to Get Started

Use with Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Sashank-810/llama3.1-8b-lft-lora"  # replace with your handle
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "You are a math tutor. Show compact reasoning, then the final answer."},
    {"role": "user", "content": "How many distinct 7-letter words can be made from BANANA?"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Use with vLLM

from vllm import LLM, SamplingParams

model_id = "Sashank-810/llama3.1-8b-lft-lora"
llm = LLM(model=model_id, tokenizer=model_id, trust_remote_code=True)

messages = [
    {"role": "system", "content": "You are a math tutor. Show compact reasoning, then the final answer."},
    {"role": "user", "content": "Explain the break-even quantity when cost=40000+12x and revenue=25x."},
]
prompt = llm.get_tokenizer().apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampling = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
out = llm.generate([prompt], sampling_params=sampling)
print(out[0].outputs[0].text)

Run the OpenAI-compatible server:

python -m vllm.entrypoints.openai.api_server \
  --model Sashank-810/llama3.1-8b-lft-lora \
  --trust-remote-code \
  --max-model-len 1536

Training Details

Training Data

  • Private structured math reasoning set with fields: topic, question, chain_of_thought_reasoning, final_answer, covered_nodes, used_edges, operations, repr_path.
  • Examples include algebra, permutations/combinations, cost-volume-profit, and proof sketches; chain-of-thought is supervised.

Speeds, Sizes, Times

  • Base: 8B parameters; LoRA merged into full weights (no adapter load).
  • Checkpoint shards: 4x .safetensors plus model.safetensors.index.json.

Evaluation

  • Qualitative spot checks on held-out quiz-style math questions (multiple choice and open-form). No formal benchmark scores reported here.
  • Recommended to evaluate on your target math set before deployment.

Environmental Impact

  • Hardware/training run details not logged; expect footprint similar to two-epoch QLoRA on 8B base with sequence packing.

Technical Specifications

Model Architecture and Objective

  • Llama 3.1 8B decoder-only causal LM, RoPE-scaled (factor 8) with 131k context; merged LoRA on attention and MLP projections; causal LM objective with right-padding.

Compute Infrastructure

  • Training code: PyTorch + Hugging Face Transformers + PEFT; flash-attention v2 when available.

Citation

If you use this model, please cite the base model and this finetune.

@misc{lft-lora-2025,
  title  = {LFT-LoRA: Structured Math Reasoning LoRA on Llama-3.1-8B-Instruct},
  year   = {2025},
  note   = {Finetuned from meta-llama/Llama-3.1-8B-Instruct},
}

Model Card Contact

  • For questions or issues, please open a Hub discussion or reach out via the repository owner on Hugging Face.
Downloads last month
8
Safetensors
Model size
8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sashank-810/llama3.1-8b-lft-lora

Adapter
(1342)
this model