File size: 6,300 Bytes
d51d317
f220cfc
 
d51d317
 
 
 
f220cfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
datasets:
- VanishD/DualDistill
language:
- en
license: mit
pipeline_tag: text-generation
library_name: transformers
---

# Agentic-R1: Distilled Dual-Strategy Reasoning

This repository hosts the **Agentic-R1** model, an implementation of the paper [**Agentic-R1: Distilled Dual-Strategy Reasoning**](https://huggingface.co/papers/2507.05707).

**Code**: https://github.com/StigLidu/DualDistill

## Abstract

Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning.

## Key Features

-   **Efficient Training**: Integrates tool use into long-chain-of-thought (CoT) reasoning using only 4 × A6000 GPUs
-   **Unified Reasoning**: Fuses heterogeneous reasoning traces from multiple teacher models into a single student model

<div align="center">
  <img src="https://github.com/StigLidu/DualDistill/raw/main/fig/overview.png" alt="Overview of DualDistill" width="500">
  <p><em>Overview of DualDistill methodology</em></p>
</div>

## Datasets

| Dataset       | Description                                   | Link                                                 |
| :------------ | :-------------------------------------------- | :--------------------------------------------------- |
| **Training Set** | Complete training dataset with teacher trajectories | [🤗 HuggingFace](https://huggingface.co/datasets/VanishD/DualDistill) |
| **Test Set**  | Evaluation benchmarks                         | `dataset/test/`                                      |

## Results

<div align="center">
  <img src="https://github.com/StigLidu/DualDistill/raw/main/fig/result.png" alt="Performance comparison of Agentic-R1 models" width="700">
</div>

-   **Agentic-R1** demonstrates significant performance gains on **DeepMath-L** and **Combinatorics300**, where both complex reasoning and tool use are crucial for success.
-   **Agentic-R1-SD** (Self-Distilled) further enhances performance through our self-distillation approach, consistently outperforming baseline models across nearly all evaluation tasks.

## Quick Start

### Installation

1.  **Clone the repository**:
    ```bash
    git clone https://github.com/StigLidu/DualDistill.git
    cd DualDistill
    ```

2.  **Create environment** (optional but recommended):
    ```bash
    conda create -n dualdistill python=3.11
    conda activate dualdistill
    ```

3.  **Install dependencies**:
    ```bash
    pip install -r requirements.txt
    pip install flash-attn --no-build-isolation
    ```

### Sample Usage

Here's how to perform inference with the `Agentic-R1` model using the Hugging Face `transformers` library:

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "VanishD/Agentic-R1" # Or "VanishD/Agentic-R1-SD" for the self-distilled version

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # Use bfloat16 for better performance and memory if supported
    device_map="auto",
    trust_remote_code=True
).eval() # Set model to evaluation mode

# Prepare a simple user message
messages = [{"role": "user", "content": "What is 123 + 456?"}]

# Apply the chat template to format the prompt correctly for the model
# The `add_generation_prompt=True` adds the Assistant token to prompt the model for its response.
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Encode the prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

# Generate response
output_ids = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id, # Often EOS token is used as PAD token for LLMs
)

# Decode and print the generated text, excluding the input prompt
generated_text = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True).strip()
print(f"Generated Text:
{generated_text}")
```

## ⚠️ Important Notes

-   **Code Execution Safety**: The evaluation scripts execute model-generated code locally. Only use trusted models before execution.
-   **Inference Config**: If you are using vLLM (a recent version) and encounter an error regarding the maximum context length. You may need to modify the `model_max_length` in `tokenizer_config.json`.
-   **Self-Distillation Warning**: The self-distillation step requires sampling many trajectories and can be time-consuming.

## License

This project is licensed under the MIT License - see the [LICENSE](https://github.com/StigLidu/DualDistill/blob/main/LICENSE) file for details.

## Acknowledgments

We thank the following open-source projects for their foundational contributions:

-   [OpenHands](https://github.com/All-Hands-AI/OpenHands) - Agent framework
-   [DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K) - Mathematical reasoning dataset
-   [vLLM](https://github.com/vllm-project/vllm) - High-performance inference engine

## Contact

For questions or support, please contact:

-   **Weihua Du**: [weihuad@cs.cmu.edu](mailto:weihuad@cs.cmu.edu)

## Citation

If you find our work useful, please consider citing:

```bibtex
@article{du2025agentic,
  title={Agentic-R1: Distilled Dual-Strategy Reasoning},
  author={Du, Weihua and Aggarwal, Pranjal and Welleck, Sean and Yang, Yiming},
  journal={arXiv preprint arXiv:2507.05707},
  year={2025}
}
```