ffurfaro's picture
Upload folder using huggingface_hub
1186208 verified
metadata
language: en
license: apache-2.0
library_name: transformers
tags:
  - tptt
  - peft
  - trust_remote_code
pipeline_tag: text-generation
base_model: meta-llama/Llama-3.2-1B
datasets:
  - yahma/alpaca-cleaned

lora_delta_product_r_m0.5_constant

arXiv PyPI Release Documentation HuggingFace

Titanesque version of meta-llama/Llama-3.2-1B with parallel linearized attention (TPTT 😊) and PEFT.

The architecture was presented in the paper TPTT.

Model Details

  • Architecture: ['TpttModel']
  • Base model: meta-llama/Llama-3.2-1B
  • LiZA config: operator=delta_product_r, mag=0.5
  • LoRA config: r=8, alpha=16, dropout=0.05
  • torch_dtype:

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"ffurfaro/lora_delta_product_r_m0.5_constant",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("ffurfaro/meta-llama/Llama-3.2-1B")

prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs, skip_special_tokens=True))

You must specify the subfolder if the repo contains multiple models, see the homepage for details.

Training

  • Dataset: yahma/alpaca-cleaned
  • Platform: Kaggle
  • Hardware: 2xT4
  • Batch size: 2
  • Epochs: 1.0
  • Learning rate (final): N/A
  • Loss (final): 7.606347968441995
  • Training runtime: 2004.1174 sec
  • Samples per second: 1.291
  • Steps per second: 0.323
  • Total FLOPs: 1937596357804032.0
  • Gradient norm (final): N/A

Evaluation

  • Metrics: Training loss only (no eval yet, table soon : PiQA, ARC, Hella, Wino, GSM8K, MMLU)
  • Results: Final training loss: 7.606347968441995

Citation & Contact

If you use TPTT in your academic work, please cite Furfaro. For questions or support, please open an issue on the GitHub repository or contact the maintainer.