--- language: - en license: mit tags: - text-generation - conversational - phi-3 - lora - peft base_model: microsoft/Phi-3-mini-4k-instruct pipeline_tag: text-generation library_name: peft widget: - text: Hello, how are you? example_title: Greeting - text: What's your approach to problem-solving? example_title: Personality Question --- # Saad's AI Twin This is a personalized AI twin fine-tuned using LoRA on Microsoft Phi-3 Mini (3.8B parameters). ## Model Details - **Base Model:** microsoft/Phi-3-mini-4k-instruct - **Fine-tuning Method:** LoRA (Low-Rank Adaptation) - **Training:** Google Colab with T4 GPU - **Purpose:** Personality replication for conversational AI ## Usage ### With Transformers + PEFT ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load base model and tokenizer base_model = AutoModelForCausalLM.from_pretrained( "microsoft/Phi-3-mini-4k-instruct", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "SaadAx/saad-twin") # Generate response prompt = "<|user|>\nHow do you handle stress?<|end|>\n<|assistant|>\n" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.8) print(tokenizer.decode(outputs[0], skip_special_tokens=False)) ``` ### Via Hugging Face Inference API ```python import requests API_URL = "https://api-inference.huggingface.co/models/SaadAx/saad-twin" headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query({ "inputs": "<|user|>\nHello!<|end|>\n<|assistant|>\n", "parameters": {"max_new_tokens": 200, "temperature": 0.8} }) print(output) ``` ## Training Details - **Dataset:** Custom personality questionnaire responses - **Training Time:** ~25 minutes - **LoRA Rank:** 16 - **Target Modules:** q_proj, v_proj, k_proj, o_proj - **Learning Rate:** 3e-4 ## Intended Use This model is designed for: - Demonstrating personality-based AI fine-tuning - Educational purposes - Research in personalized AI systems ## Limitations - May not perfectly capture all personality nuances - Requires Phi-3 prompt format - Limited to English language - 4K context window ## License MIT License - Free for commercial and non-commercial use ``` **Click "Commit changes to main"** --- ### **Step 3: Wait 5-10 Minutes** After updating the README: - Hugging Face needs to reprocess your model - Refresh the page after 10 minutes - Look for "Hosted inference API" section to appear --- ### **Step 4: Test If It Works** After 10 minutes, try this in your browser: Go to: https://huggingface.co/SaadAx/saad-twin Look for the **inference widget** on the right side. If you see it, type a test message! --- ## **OPTION 2: Use Inference Endpoints (If Free Doesn't Work)** If free inference doesn't enable, you can deploy a dedicated endpoint. ### **Step 1: Go to Inference Endpoints** 1. Click your profile (top right) → **Settings** 2. In left sidebar: Click **Inference Endpoints** (or go to https://ui.endpoints.huggingface.co/) ### **Step 2: Create New Endpoint** 1. Click **"+ New endpoint"** 2. Fill in: - **Model Repository:** `SaadAx/saad-twin` - **Endpoint name:** `saad-twin-api` - **Cloud Provider:** AWS (or Azure/GCP) - **Region:** us-east-1 (or closest to you) - **Instance Type:** - **CPU:** `cpu.small` (cheapest, slow) - **GPU:** `gpu.small` (recommended for Phi-3) ### **Pricing (if you go this route):** ``` CPU (slow): ~$0.03/hour = $20/month GPU (fast): ~$0.60/hour = $400/month