MedRAGChecker Claim Extractor · LoRA Adapter

Biomedical claim-triple extractor fine-tuned from a medical LLM using GPT-4.1 teacher labels.
This adapter is part of the MedRAGChecker pipeline for claim-level verification in biomedical RAG.

Task: given a medical question and its answer, extract factual triples of the form
[subject, relation, object] as a pure JSON array.

Model summary

Base model: <BASE_MODEL_ID> (for example: med42-llama3-8b, Meditron3-8B, PMC_LLaMA_13B, or qwen2-med-7b)
Adapter type: LoRA (rank = 16, alpha = 32, dropout = 0.0) via PEFT
Architecture: same as base causal LM (LLaMA-style or Qwen-style)
Task: biomedical claim triple extraction
Input: question text + model answer (plain text)
Output: JSON array of triples, e.g.

[
  ["Psoriasis", "is", "chronic inflammatory skin disease"],
  ["Psoriasis", "is associated with", "systemic comorbidities"]
]

You can either:

keep one Hugging Face repo per adapter (recommended), or
store several adapters in one repo and refer to specific subfolders.

Replace <BASE_MODEL_ID> and any placeholder names below with your actual base model and repo id (for example: JoyDaJun/MedRAGChecker-Extractor-Meditron3-8B).

Intended use

Post-hoc analysis of biomedical QA systems at claim level.
Use inside a RAG or QA evaluation pipeline to:
- extract atomic factual statements from a generated answer;
- feed those triples to a checker model (e.g. MedRAGChecker NLI+KG).

This adapter is not a general-purpose chat model and must not be used as a standalone medical assistant.

How to use

1. LLaMA-style base models (Meditron, Med42, PMC-LLaMA, etc.)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch, json

base_model_id = "<BASE_MODEL_ID>"   # e.g. "med42-llama3-8b"
adapter_id    = "<ADAPTER_REPO_ID>" # e.g. "JoyDaJun/MedRAGChecker-Extractor-Med42-8B"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)

def build_prompt(question: str, answer: str) -> str:
    system_part = (
        "You are an information extraction assistant. "
        "Given a medical question and its answer, extract all factual triples "
        "as [subject, relation, object]. "
        "Return a pure JSON array of triples, with no explanations, no extra text, "
        "no comments. If there are no clear factual triples, return an empty JSON array []."
    )
    qa_part = f"Question: {question}\nAnswer: {answer}"
    return (
        system_part
        + "\n\n"
        + qa_part
        + '\n\nTriples (JSON only, e.g. [["subj", "rel", "obj"], ...]):\n'
    )

question = "Does hypercholesterolemia increase leukotriene B4 in neutrophils?"
answer = "Hypercholesterolemia increases 5-LO activity in neutrophils..."

prompt = build_prompt(question, answer)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    gen_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

text = tokenizer.decode(gen_ids[0], skip_special_tokens=True)

# Optional: keep only the JSON array
start = text.find("[")
end = text.rfind("]") + 1
json_str = text[start:end] if start != -1 and end != -1 else "[]"
triples = json.loads(json_str)
print(triples)

2. Chat-style base models (Qwen2-med, etc.)

For chat-style models, wrap the same prompt inside the chat template.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch, json

base_model_id = "<QWEN_BASE_MODEL_ID>"   # e.g. "qwen2-med-7b"
adapter_id    = "<ADAPTER_REPO_ID_QWEN>" # e.g. "JoyDaJun/MedRAGChecker-Extractor-Qwen2-med-7B"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)

def build_prompt(question: str, answer: str) -> str:
    system_part = (
        "Given a medical question and its answer, extract all factual triples "
        "as [subject, relation, object]. "
        "Return only a JSON array of triples."
    )
    qa_part = f"Question: {question}\nAnswer: {answer}"
    return system_part + "\n\n" + qa_part + '\n\nTriples (JSON only, e.g. [["subj", "rel", "obj"], ...]):\n'

question = "Does hypercholesterolemia increase leukotriene B4 in neutrophils?"
answer = "Hypercholesterolemia increases 5-LO activity in neutrophils..."

messages = [
    {"role": "system", "content": "You are an information extraction assistant."},
    {"role": "user", "content": build_prompt(question, answer)},
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    gen_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

text = tokenizer.decode(gen_ids[0], skip_special_tokens=True)
start = text.find("[")
end = text.rfind("]") + 1
json_str = text[start:end] if start != -1 and end != -1 else "[]"
triples = json.loads(json_str)
print(triples)

Training details

This adapter was trained with the DistillExtractor/train_extractor_sft.py script in the MedRAGChecker codebase.

Teacher model: GPT-4.1 as claim-triple annotator.
Training data:
- JSONL file extractor_sft.jsonl with fields:
  - instruction: system prompt + Question: + Answer: (from biomedical QA datasets and RAG outputs).
  - output: pure JSON array of [subject, relation, object] triples labeled by GPT-4.1.
- Sources include consumer and research-style biomedical QA (e.g., MedQuAD, PubMedQA, LiveQA Medical, CSIRO MedRedQA, and AskDocs-style Reddit threads).
Preprocessing:
- Parse Question: and Answer: from the instruction field using regex.
- Rebuild a canonical prompt with an explicit Triples (JSON only, e.g. [["subj", "rel", "obj"], ...]): header.
Fine-tuning setup (example):
- Epochs: 10
- Batch size: 1 with gradient accumulation 32 (effective batch size 32).
- Max input length: 2048.
- Optimizer: AdamW, learning rate 1e-4.
- LoRA config: r = 16, alpha = 32, dropout = 0.0.
- Precision: bfloat16 on GPUs with device_map="auto".

Example training command:

export WANDB_PROJECT=MedRAGChecker
export WANDB_NAME=extractor_<BASE_NAME>

BASE=/path/to/<BASE_MODEL_ID>
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python DistillExtractor/train_extractor_sft.py \
  --model_name "$BASE" \
  --train_path ./data/extractor_sft.jsonl \
  --output_dir ./runs/extractor_sft_<BASE_NAME> \
  --epochs 10 \
  --batch_size 1 \
  --grad_accum 32 \
  --lr 1e-4 \
  --bf16

Replace <BASE_MODEL_ID> and <BASE_NAME> with your actual base model.

Evaluation

We evaluate on a held-out split of the same GPT-4.1-annotated dataset using two families of metrics:

Strict triple match
- Normalize to lowercase and strip whitespace.
- Treat each triple as a set element (subject, relation, object).
- Compute precision/recall/F1 on exact triple matches.
- Also report exact match rate (all triples in an example match exactly).
Soft triple match
- Tokenize subject, relation, and object.
- Compute token-level F1 for each field between predicted and gold triples.
- Aggregate into a per-triple similarity score.
- Run greedy matching between predicted and gold triples by similarity.
- Compute soft precision/recall/F1 from matched pairs.

Example metrics on a random subsample of N = 200 examples for a Meditron3-8B-based extractor:

Metric	Value
strict_precision	0.0890
strict_recall	0.0930
strict_f1	0.0900
exact_match	0.0500
soft_precision	0.2052
soft_recall	0.2598
soft_f1	0.2148

These numbers illustrate that:

the model is far from perfect at exact triple reconstruction;
soft matching shows it still captures many approximate facts, which is often sufficient for downstream diagnostics in MedRAGChecker.

You can reproduce these metrics (and compute new ones for other checkpoints) with the evaluation script:

python DistillExtractor/run_extractor_eval_soft.py \
  --base_model <BASE_MODEL_ID> \
  --adapter_path <ADAPTER_REPO_OR_LOCAL_PATH> \
  --data_path ./data/extractor_sft.jsonl \
  --output_path ./results/extractor_soft_<BASE_NAME>.json \
  --num_examples 200

Limitations and risks

The adapter inherits all limitations and biases of the base model and GPT-4.1 teacher.
Extracted triples may still be incomplete, redundant, or slightly rephrased.
The model is optimized for English biomedical text; performance on other domains or languages is likely poor.
Do not use this model (or its extracted triples) directly for patient-facing decisions or clinical care without expert validation.

Citation

If you use this adapter or MedRAGChecker in your work, please consider citing our paper (details to be updated):

@inproceedings{ji2025medragchecker,
  title     = {MedRAGChecker: Claim-level Verification for Biomedical Retrieval-Augmented Generation},
  author    = {Ji, Yuelyu and collaborators},
  booktitle = {Proceedings of a future venue},
  year      = {2025}
}

License

This adapter is released under the same license terms as the corresponding base model <BASE_MODEL_ID>.
You must accept and comply with the license of the base model before using this LoRA.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support