---
language:
- en
license: mit
tags:
- text-generation
- conversational
- phi-3
- lora
- peft
base_model: microsoft/Phi-3-mini-4k-instruct
pipeline_tag: text-generation
library_name: peft
widget:
- text: Hello, how are you?
  example_title: Greeting
- text: What's your approach to problem-solving?
  example_title: Personality Question
---

# Saad's AI Twin

This is a personalized AI twin fine-tuned using LoRA on Microsoft Phi-3 Mini (3.8B parameters).

## Model Details

- **Base Model:** microsoft/Phi-3-mini-4k-instruct
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **Training:** Google Colab with T4 GPU
- **Purpose:** Personality replication for conversational AI

## Usage

### With Transformers + PEFT
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "SaadAx/saad-twin")

# Generate response
prompt = "<|user|>\nHow do you handle stress?<|end|>\n<|assistant|>\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
```

### Via Hugging Face Inference API
```python
import requests

API_URL = "https://api-inference.huggingface.co/models/SaadAx/saad-twin"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "<|user|>\nHello!<|end|>\n<|assistant|>\n",
    "parameters": {"max_new_tokens": 200, "temperature": 0.8}
})
print(output)
```

## Training Details

- **Dataset:** Custom personality questionnaire responses
- **Training Time:** ~25 minutes
- **LoRA Rank:** 16
- **Target Modules:** q_proj, v_proj, k_proj, o_proj
- **Learning Rate:** 3e-4

## Intended Use

This model is designed for:
- Demonstrating personality-based AI fine-tuning
- Educational purposes
- Research in personalized AI systems

## Limitations

- May not perfectly capture all personality nuances
- Requires Phi-3 prompt format
- Limited to English language
- 4K context window

## License

MIT License - Free for commercial and non-commercial use
```

**Click "Commit changes to main"**

---

### **Step 3: Wait 5-10 Minutes**

After updating the README:
- Hugging Face needs to reprocess your model
- Refresh the page after 10 minutes
- Look for "Hosted inference API" section to appear

---

### **Step 4: Test If It Works**

After 10 minutes, try this in your browser:

Go to: https://huggingface.co/SaadAx/saad-twin

Look for the **inference widget** on the right side. If you see it, type a test message!

---

## **OPTION 2: Use Inference Endpoints (If Free Doesn't Work)**

If free inference doesn't enable, you can deploy a dedicated endpoint.

### **Step 1: Go to Inference Endpoints**

1. Click your profile (top right) → **Settings**
2. In left sidebar: Click **Inference Endpoints** (or go to https://ui.endpoints.huggingface.co/)

### **Step 2: Create New Endpoint**

1. Click **"+ New endpoint"**
2. Fill in:
   - **Model Repository:** `SaadAx/saad-twin`
   - **Endpoint name:** `saad-twin-api`
   - **Cloud Provider:** AWS (or Azure/GCP)
   - **Region:** us-east-1 (or closest to you)
   - **Instance Type:** 
     - **CPU:** `cpu.small` (cheapest, slow)
     - **GPU:** `gpu.small` (recommended for Phi-3)

### **Pricing (if you go this route):**
```
CPU (slow):      ~$0.03/hour = $20/month
GPU (fast):      ~$0.60/hour = $400/month