sharadsnaik
/

medgemma-4b-it-medical-gguf

Image-Text-to-Text

Model card Files Files and versions

sharadsnaik commited on Aug 16

Commit

1504024

·

verified ·

1 Parent(s): 4f07730

code snippet in README

Files changed (1) hide show

README.md +73 -1

README.md CHANGED Viewed

@@ -16,9 +16,81 @@ language:
 - en
 pipeline_tag: image-text-to-text
 ---
 from huggingface_hub import hf_hub_download
 from llama_cpp import Llama
-p = hf_hub_download("USERNAME/medgemma-4b-it-medical-gguf",
                     "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf")
 llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma")
 print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}]))

 - en
 pipeline_tag: image-text-to-text
 ---
+# medgemma-4b-it — medical fine-tune (5-bit GGUF)
+## Model Details
+## Files
+- `medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf` (~2.83 GB)
+## How to run (llama.cpp)
+```bash
+# Requires llama.cpp. You can run directly from the Hub path:
+llama-cli -m hf://sharadsnaik/medgemma-4b-it-medical-gguf/medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf -p "Hello"
+```
+## How to Get Started with the Model
+```
 from huggingface_hub import hf_hub_download
 from llama_cpp import Llama
+p = hf_hub_download("sharadsnaik/medgemma-4b-it-medical-gguf",
                     "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf")
 llm = Llama(model_path=p, n_ctx=4096, n_threads=8, chat_format="gemma")
 print(llm.create_chat_completion(messages=[{"role":"user","content":"Hello"}]))
+```
+[More Information Needed]
+## Training Details
+### Training Data
+```
+ruslanmv/ai-medical-chatbot
+```
+## Sample Code Usage:
+#### `app.py`
+```python
+import os, gradio as gr
+from huggingface_hub import hf_hub_download
+from llama_cpp import Llama
+# Your model repo + filename
+REPO_ID = "sharadsnaik/medgemma-4b-it-medical-gguf"
+FILENAME = "medgemma-4b-it-finnetunned-merged_new_for_cpu_q5_k_m.gguf"
+# Download from Hub to local cache
+MODEL_PATH = hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="model")
+# Create the llama.cpp model
+# Use all available CPU threads; chat_format="gemma" matches Gemma-style prompts
+llm = Llama(
+    model_path=MODEL_PATH,
+    n_ctx=4096,
+    n_threads=os.cpu_count(),
+    chat_format="gemma"  # important for Gemma/Med-Gemma instruction formatting
+)
+def chat_fn(message, history):
+    # Convert Gradio history -> OpenAI-style messages
+    messages = []
+    for user_msg, bot_msg in history:
+        messages.append({"role":"user","content":user_msg})
+        if bot_msg:
+            messages.append({"role":"assistant","content":bot_msg})
+    messages.append({"role":"user","content":message})
+    out = llm.create_chat_completion(messages=messages, temperature=0.6, top_p=0.95)
+    reply = out["choices"][0]["message"]["content"]
+    return reply
+demo = gr.ChatInterface(fn=chat_fn, title="MedGemma 4B (Q5_K_M) — CPU Space")
+if __name__ == "__main__":
+    demo.launch()
+```