drawing

Falcon-H1R-7B

This repository presents Falcon-H1R-7B, a reasoning-specialized model introduced in the paper Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling.

Built on top of Falcon-H1-7B-Base, it was trained via cold-start supervised fine-tuning with long reasoning traces and further enhanced by scaling RL with GRPO. The model demonstrates outstanding performance across various benchmark evaluations, including mathematics, programming, instruction following, and general logic.

Model Description

Training details

For more details about the training protocol of this model, please refer to the Falcon-H1R technical blogpost and Technical Report.

Usage

Setup

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release -t llama-server

Serving

./llama-server -m Falcon-H1R-7B-Q8_0.gguf \
  --temp 0.6 \
  --top-p 0.95 \
  -n 65536

We recommend using a temperature of 0.6 and top-p as 0.95 with max new tokens up to 65536. For supported frameworks, you can adjust the repetition_penalty and presence_penalty parameters to reduce endless repetitions.

Evaluation

Falcon-H1R achieves state of art results in reasoning benchmarks.

Category Benchmark Falcon-H1R-7B Qwen3-8B DeepSeek-R1-0528-Qwen3-8B Phi-4-Reasoning-Plus-14B Apriel-1.5-15b-Thinker GPT-OSS-20B Qwen3-32B Nemotron-H-47B-Reasoning
MATH AIME24 88.1 77.9 83.3 77.2 86.2 83.3 79.4 64.6
AIME25 83.1 65.8 75.8 71.2 80.0 84.4 71.0 51.4
HMMT25 64.9 41.0 54.3 47.7 61.0 64.8 49.8 34.2
AMO-BENCH 36.3 14.1 23.3 15.0 22.2 26.0 21.3 7.0
MATH500 97.4 97.4 96.8 95.4 97.2 94.8 96.8 91.4
Code LCBv5-v6 68.6 53.0 57.2 53.1 53.0 72.0 61.0 47.4
SciCode (sub/main) 28.3 / 3.9 28.3 / 6.7 22.2 / 2.6 29.8 / 7.2 31.9 / 8.2 34.9 / 6.2 36.4 / 9.2 26.1 / 4.6
General GPQA-D 61.3 61.2 61.4 67.9 68.2 61.2 67.3 56.8
MMLU-Pro 72.1 63.5 69.1 79.2 76.5 75.6 73.9 78.6
HLE 11.1 4.2 5.6 5.9 12.0 9.8 8.3 4.4
IFBench 53.4 35.3 29.2 51.7 55.8 69.4 35.4 34.3
Agentic Workflows 𝜏²-Bench Telecom 25.4 27.8 68.4 60.2 29.8 11.4
Terminal-Bench Hard 4.9 2.1 1.4 2.1 9.9 9.9 2.8 1.4

TTS represents test time scaling results on few of the benchmarks that we evaluated via DeepConf.

Benchmark Falcon-H1R-7B Qwen3-8B DeepSeek-R1-0528-Qwen3-8B Nemotron-H-8B Phi-4-Reasoning-Plus-14B Qwen3-32B
AIME24 96.7 80.0 90.0 53.3 86.7 86.7
AIME25 96.7 80.0 82.8 43.3 83.3 86.7
GPQA-D 70.2 60.9 59.9 61.1 73.2 70.1
AMO-Bench 35.9 15.4 25.6 7.7 20.5 28.2

Useful links

Citation

If the Falcon-H1R family of reasoning models is helpful to your work, feel free to give us a cite.

@misc{falcon-h1r,
      title={Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling}, 
      author={Falcon LLM Team and Iheb Chaabane and Puneesh Khanna and Suhail Mohmad and Slim Frikha and Shi Hu and Abdalgader Abubaker and Reda Alami and Mikhail Lubinets and Mohamed El Amine Seddik and Hakim Hacid},
      year={2026},
      eprint={2601.02346},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.02346}, 
}
Downloads last month
3,774
GGUF
Model size
8B params
Architecture
falcon-h1
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tiiuae/Falcon-H1R-7B-GGUF

Quantized
(4)
this model

Collection including tiiuae/Falcon-H1R-7B-GGUF

Paper for tiiuae/Falcon-H1R-7B-GGUF