🧬 Rx-Interactome MedGemma (LoRA Fine-Tuned)

This model is a LoRA fine-tuned version of MedGemma-4B-it for protein–protein interaction reasoning and drug effect analysis on protein interaction networks (interactomes).

It is part of the Rx-Interactome project.


πŸ“Œ Model Overview

  • Base Model: google/medgemma-4b-it
  • Fine-Tuning Method: Supervised Fine-Tuning (SFT)
  • Parameter-Efficient Training: LoRA adapters
  • Quantization: 4-bit (NF4)
  • Precision: bfloat16
  • Frameworks: Transformers, PEFT, TRL

This model is designed to reason over:

  • Protein–protein interaction sub-networks
  • Context-aware interaction patterns
  • Drug-induced network perturbations
  • Network-level therapeutic effects

🧠 Intended Use

This model is intended for:

  • Protein interaction reasoning
  • Systems biology research
  • Drug mechanism analysis
  • Network-based disease modeling
  • Hypothesis generation in biomedical research

Multiple proteins can be provided in the same prompt to enable interaction-level reasoning.


πŸ—‚ Training Data

Training data was constructed from:

  • Subcellular localization information
  • Pathway membership (Reactome)
  • Protein–protein interactions (STRING database)

Large interaction networks were decomposed into biologically meaningful 3–4 protein sub-networks to improve contextual learning.

Each training sample followed a chat-style format:

  • User: Protein sequences and context
  • Assistant: Interaction reasoning or network explanation

βš™οΈ Training Configuration

  • Epochs: 3
  • Learning Rate: 5e-4
  • Batch Size: 1
  • Gradient Accumulation: 2
  • Evaluation Steps: 15
  • Optimizer: AdamW (fused)
  • Scheduler: Linear
  • Max Gradient Norm: 0.3
  • Warmup Ratio: 0.03
  • Gradient Checkpointing: Enabled

LoRA Configuration:

  • r: 16
  • alpha: 16
  • dropout: 0.05
  • Target modules: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj

Quantization:

  • 4-bit NF4
  • Double quantization enabled
  • bfloat16 compute dtype

πŸš€ How to Use

from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel

base_model = AutoModelForImageTextToText.from_pretrained(
    "google/medgemma-4b-it",
    device_map="auto",
    torch_dtype="bfloat16"
)

model = PeftModel.from_pretrained(
    base_model,
    "your-username/rx-interactome-medgemma"
)

processor = AutoProcessor.from_pretrained("google/medgemma-4b-it")

prompt = """
Analyze the following protein sequences:
Protein: ENSP000001
Sequence: MSEQUENCE...

Protein: ENSP000002
Sequence: ASEQUENCE...


Provide structured interpretation.
"""

inputs = processor(text=prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(processor.decode(outputs[0], skip_special_tokens=True))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for chandanreddy/rx-interactome-lora-medgemma

Adapter
(84)
this model