LLaMA2-13B-RankLLaMA-Teacher

Model Description

LLaMA2-13B-RankLLaMA-Teacher is a 13B parameter teacher model designed for neural reranking tasks. This model serves as the foundation for knowledge distillation in the DeAR framework, generating Chain-of-Thought (CoT) reasoning to guide smaller student models.

Model Details

  • Model Type: Sequence Classification (Reranking)
  • Base Model: LLaMA-2-13B
  • Parameters: 13 billion
  • Training Data: MS MARCO Passage Ranking
  • Purpose: Teacher model for knowledge distillation
  • Output: Relevance scores for query-document pairs

Intended Use

This model is intended to:

  • Generate training signals for student reranker models
  • Provide Chain-of-Thought reasoning for reranking tasks
  • Serve as a baseline for evaluating distilled models

Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_name = "abdoelsayed/llama2-13b-rankllama-teacher"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model.eval()

# Score a query-document pair
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence that focuses on training algorithms to learn patterns from data."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=512,
    padding=True
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
    
print(f"Relevance score: {score}")

Training Details

Training Data

  • Dataset: MS MARCO Passage Ranking
  • Training Samples: Large-scale query-document pairs with relevance labels

Training Configuration

  • Base Model: meta-llama/Llama-2-13b-hf
  • Objective: Pointwise ranking with sequence classification
  • Hardware: Multi-GPU training (4x A100)
  • Precision: Mixed precision (bfloat16)

Hyperparameters

  • Learning Rate: 2e-5
  • Batch Size: 8 per device
  • Epochs: 3
  • Max Sequence Length: 512
  • Warmup Steps: 1000

Evaluation

This teacher model is evaluated on standard IR benchmarks:

Dataset NDCG@10
MS MARCO Dev 72.5
TREC DL19 73.8
TREC DL20 71.2

Model Architecture

LLaMA2-13B
    ↓
[Transformer Layers]
    ↓
[Classification Head]
    ↓
Relevance Score

Distillation

This teacher model generates soft labels and CoT reasoning used to train:

  • DeAR-8B models (RankNet, CE, Listwise)
  • DeAR-3B models (RankNet, CE)

The distillation process uses:

  • Temperature: 2.0
  • Alpha (KD weight): 0.1
  • CoT Dataset: DeAR-COT

Limitations

  • Computational Cost: 13B parameters require significant GPU memory (>26GB)
  • Inference Speed: Slower than distilled student models
  • Domain Specificity: Trained primarily on MS MARCO, may require fine-tuning for other domains

Citation

If you use this model, please cite:

@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}

License

MIT License

Related Models

Student Models (Distilled from this teacher):

8B Models:

3B Models:

Dataset:

More Information

Downloads last month
20
Safetensors
Model size
13B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdoelsayed/llama2-13b-rankllama-teacher

Finetuned
(57)
this model

Dataset used to train abdoelsayed/llama2-13b-rankllama-teacher

Collection including abdoelsayed/llama2-13b-rankllama-teacher