CLaRa
Collection
This is the Hugging Face repository for the paper CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning.
•
3 items
•
Updated
•
1
The CLaRa-7B-E2E model is our fully end-to-end unified RAG model, jointly optimizing retrieval and generation with 16× and 128x document compression.
Training recipe: End-to-end finetuning with differentiable top-k retrieval and a unified language-modeling objective.
Benchmarks: Strong retrieval-augmented QA performance under aggressive compression.
Paper: CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
GitHub: https://github.com/apple/ml-clara
from transformers import AutoModel
unirag = AutoModel.from_pretrained(
"/mnt/ceph_rbd/model/CLaRa-7B-E2E/compression-16",
trust_remote_code=True
).to("cuda")
# Example documents and question
documents = [[
"Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
] * 20]
questions = [
"Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?"
]
# End-to-end usage (retrieval + generation)
# The effective top-k is controlled by `generation_top_k` in config.json.
out = unirag.generate_from_questions(
questions=questions,
documents=documents,
max_new_tokens=64
)
print("Generated answer", out)
Base model
mistralai/Mistral-7B-Instruct-v0.2