𧬠Rx-Interactome MedGemma (LoRA Fine-Tuned)
This model is a LoRA fine-tuned version of MedGemma-4B-it for proteinβprotein interaction reasoning and drug effect analysis on protein interaction networks (interactomes).
It is part of the Rx-Interactome project.
π Model Overview
- Base Model:
google/medgemma-4b-it - Fine-Tuning Method: Supervised Fine-Tuning (SFT)
- Parameter-Efficient Training: LoRA adapters
- Quantization: 4-bit (NF4)
- Precision: bfloat16
- Frameworks: Transformers, PEFT, TRL
This model is designed to reason over:
- Proteinβprotein interaction sub-networks
- Context-aware interaction patterns
- Drug-induced network perturbations
- Network-level therapeutic effects
π§ Intended Use
This model is intended for:
- Protein interaction reasoning
- Systems biology research
- Drug mechanism analysis
- Network-based disease modeling
- Hypothesis generation in biomedical research
Multiple proteins can be provided in the same prompt to enable interaction-level reasoning.
π Training Data
Training data was constructed from:
- Subcellular localization information
- Pathway membership (Reactome)
- Proteinβprotein interactions (STRING database)
Large interaction networks were decomposed into biologically meaningful 3β4 protein sub-networks to improve contextual learning.
Each training sample followed a chat-style format:
- User: Protein sequences and context
- Assistant: Interaction reasoning or network explanation
βοΈ Training Configuration
- Epochs: 3
- Learning Rate: 5e-4
- Batch Size: 1
- Gradient Accumulation: 2
- Evaluation Steps: 15
- Optimizer: AdamW (fused)
- Scheduler: Linear
- Max Gradient Norm: 0.3
- Warmup Ratio: 0.03
- Gradient Checkpointing: Enabled
LoRA Configuration:
- r: 16
- alpha: 16
- dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj
Quantization:
- 4-bit NF4
- Double quantization enabled
- bfloat16 compute dtype
π How to Use
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
base_model = AutoModelForImageTextToText.from_pretrained(
"google/medgemma-4b-it",
device_map="auto",
torch_dtype="bfloat16"
)
model = PeftModel.from_pretrained(
base_model,
"your-username/rx-interactome-medgemma"
)
processor = AutoProcessor.from_pretrained("google/medgemma-4b-it")
prompt = """
Analyze the following protein sequences:
Protein: ENSP000001
Sequence: MSEQUENCE...
Protein: ENSP000002
Sequence: ASEQUENCE...
Provide structured interpretation.
"""
inputs = processor(text=prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300)
print(processor.decode(outputs[0], skip_special_tokens=True))