File size: 2,477 Bytes
4f07730 1504024 4f07730 1504024 4f07730 1504024 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
license: gemma
base_model: google/medgemma-4b-it
tags:
- gguf
- llama.cpp
- quantized
- q5_k_m
- medical
- chat
library_name: llama.cpp
inference: false
datasets:
- ruslanmv/ai-medical-chatbot
language:
- en
pipeline_tag: image-text-to-text
---
# medgemma-4b-it — medical fine-tune (5-bit GGUF)
## Model Details
## Files
- `medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf` (~2.83 GB)
## How to run (llama.cpp)
```bash
# Requires llama.cpp. You can run directly from the Hub path:
llama-cli -m hf://sharadsnaik/medgemma-4b-it-medical-gguf/medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf -p "Hello"
```
## How to Get Started with the Model
```
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
p = hf_hub_download("sharadsnaik/medgemma-4b-it-medical-gguf",
"medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf")
llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma")
print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}]))
```
[More Information Needed]
## Training Details
### Training Data
```
ruslanmv/ai-medical-chatbot
```
## Sample Code Usage:
#### `app.py`
```python
import os, gradio as gr
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
# Your model repo + filename
REPO_ID = "sharadsnaik/medgemma-4b-it-medical-gguf"
FILENAME = "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf"
# Download from Hub to local cache
MODEL_PATH = hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="model")
# Create the llama.cpp model
# Use all available CPU threads; chat_format="gemma" matches Gemma-style prompts
llm = Llama(
model_path=MODEL_PATH,
n_ctx=4096,
n_threads=os.cpu_count(),
chat_format="gemma" # important for Gemma/Med-Gemma instruction formatting
)
def chat_fn(message, history):
# Convert Gradio history -> OpenAI-style messages
messages = []
for user_msg, bot_msg in history:
messages.append({"role":"user","content":user_msg})
if bot_msg:
messages.append({"role":"assistant","content":bot_msg})
messages.append({"role":"user","content":message})
out = llm.create_chat_completion(messages=messages, temperature=0.6, top_p=0.95)
reply = out["choices"][0]["message"]["content"]
return reply
demo = gr.ChatInterface(fn=chat_fn, title="MedGemma 4B (Q5_K_M) — CPU Space")
if __name__ == "__main__":
demo.launch()
``` |