Llama 3 8B - ShareGPT 10K Voice AI

This is a LoRA-finetuned version of Meta-Llama-3-8B-Instruct, trained on 10,887 high-quality conversations from the ShareGPT52K dataset.

🎯 Model Overview

  • Base Model: meta-llama/Meta-Llama-3-8B-Instruct
  • Training Method: LoRA (Low-Rank Adaptation)
  • Quantization: 4-bit (bitsandbytes)
  • Dataset: RyokoAI/ShareGPT52K
  • Training Conversations: 10,887
  • Training Steps: 2,043
  • Training Epochs: 3
  • Adapter Size: ~161 MB

πŸ“Š Training Configuration

  • Learning Rate: 2e-4
  • Batch Size: 4 (effective: 16 with gradient accumulation)
  • Gradient Accumulation Steps: 4
  • LoRA Rank (r): 16
  • LoRA Alpha: 32
  • LoRA Dropout: 0.05
  • Optimizer: paged_adamw_32bit
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Trainable Parameters: 41,943,040 (0.52% of total)

πŸš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    device_map="auto",
    torch_dtype=torch.float16
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "owenergy/llama3-sharegpt-10k-voice-ai")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

# Generate response
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

πŸŽ“ Training Data

The model was trained on 10,887 diverse conversations from ShareGPT52K, covering:

  • πŸ’» Technical assistance and programming
  • πŸ“š General knowledge and education
  • ✍️ Creative writing and storytelling
  • πŸ”§ Problem-solving and troubleshooting
  • πŸ’¬ Natural conversation and dialogue
  • 🌍 Wide range of topics and domains

✨ What Makes This Model Special

  1. High-Quality Data: Trained exclusively on ShareGPT conversations, known for natural and helpful responses
  2. Optimized for Voice AI: Designed for conversational applications with natural dialogue flow
  3. Efficient LoRA: Only 161MB adapter that works with the base model
  4. Well-Balanced: 3 full epochs ensure good learning without overfitting
  5. Conversational Excellence: Excels at multi-turn dialogues and context understanding

πŸ’ͺ Model Strengths

  • Natural, human-like conversational responses
  • Strong context retention across multiple turns
  • Helpful and informative explanations
  • Creative problem-solving approaches
  • Adaptable to various conversation styles

⚠️ Limitations

  • Requires base Llama 3 8B model (this is just the LoRA adapter)
  • Trained primarily on English conversations
  • May inherit biases present in training data
  • Requires proper chat template formatting
  • Best performance with conversational use cases

πŸ“ˆ Performance

The model shows improved performance over the base model on:

  • Conversational coherence
  • Response helpfulness
  • Natural dialogue flow
  • Context understanding
  • Multi-turn conversations

πŸ› οΈ Use Cases

Perfect for:

  • πŸŽ™οΈ Voice assistants
  • πŸ’¬ Chatbots
  • πŸ“ž Customer service AI
  • πŸ€– Interactive AI applications
  • πŸ“± Mobile AI assistants
  • 🌐 Web-based chat interfaces

πŸ“¦ Model Files

  • adapter_model.safetensors - LoRA adapter weights (~161MB)
  • adapter_config.json - Adapter configuration
  • tokenizer_config.json - Tokenizer settings
  • special_tokens_map.json - Special tokens
  • Checkpoints saved every 100 steps

πŸ”§ Hardware Requirements

For Inference:

  • GPU: 12GB+ VRAM (with 4-bit quantization)
  • RAM: 16GB+ system memory
  • Storage: ~5GB (base model) + 161MB (adapter)

Recommended Setup:

  • GPU: RTX 3090, RTX 4090, A100, or similar
  • With 4-bit quantization: Can run on consumer GPUs

πŸ“ Citation

@misc{llama3-sharegpt-10k-voice-ai,
  author = {owenergy},
  title = {Llama 3 8B ShareGPT 10K Voice AI},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/owenergy/llama3-sharegpt-10k-voice-ai},
  note = {LoRA finetuned on 10,887 ShareGPT conversations}
}

πŸ“„ License

This model inherits the Llama 3 license from Meta. Please review the Llama 3 License Agreement.

πŸ™ Acknowledgments

  • Meta for the Llama 3 base model
  • RyokoAI for the ShareGPT52K dataset
  • HuggingFace for the transformers and PEFT libraries

πŸ“§ Contact

For questions, issues, or feedback, please open an issue on HuggingFace.


Model Card by: owenergy
Date: December 2025
Status: Production Ready βœ…

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for owenergy/llama3-sharegpt-10k-voice-ai

Adapter
(2094)
this model

Dataset used to train owenergy/llama3-sharegpt-10k-voice-ai