Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

448

Full-text search

Active filters: rlhf

BlankZ/ragen-checkpoint-step-600-bf16

Text Generation • 2B • Updated Jul 31, 2025 • 4

BlankZ/ragen-checkpoint-step-900-bf16

Text Generation • 2B • Updated Jul 31, 2025 • 1

BlankZ/ragen-checkpoint-step-1200-bf16

Text Generation • 2B • Updated Jul 31, 2025 • 1

alphadl/ppo-gsm8k-0.5b

Text Generation • 0.6B • Updated Aug 4, 2025 • 6 • 2

gandhiraketla277/demo-lora-reward-model

Text Generation • Updated Aug 10, 2025

Nirav-Madhani/gemma3-270m-grpo-math

Text Generation • 0.3B • Updated Aug 26, 2025 • 4

Schrieffer/Llama-SARM-4B

Reinforcement Learning • 5B • Updated Dec 11, 2025 • 12 • 1

Arc-Intelligence/ATLAS-8B-Thinking

Text Generation • 8B • Updated Sep 12, 2025 • 31 • 4

SakaiSec/ATLAS-8B-Thinking-Q4_K_M-GGUF

Text Generation • 8B • Updated Sep 13, 2025 • 4

mradermacher/ATLAS-8B-Thinking-GGUF

Reinforcement Learning • 8B • Updated Sep 13, 2025 • 1.08k • 1

mradermacher/ATLAS-8B-Thinking-i1-GGUF

Reinforcement Learning • 8B • Updated Dec 7, 2025 • 152 • 1

Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-6-beta-0.1-loss-sigmoid-rpo-1.0-ckpt-135

196k • Updated Sep 16, 2025 • 1

Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-6-beta-0.01-loss-sigmoid-rpo-1.0-ckpt-135

196k • Updated Sep 16, 2025

Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-6-beta-0.05-loss-sigmoid-rpo-1.0-ckpt-135

196k • Updated Sep 16, 2025

Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-7-beta-0.1-loss-sigmoid-rpo-1.0-ckpt-135

196k • Updated Sep 16, 2025 • 1

Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-7-beta-0.01-loss-sigmoid-rpo-1.0-ckpt-135

196k • Updated Sep 16, 2025

Asap7772/Qwen3-4B-second-stage-DPO-lr-1e-7-beta-0.05-loss-sigmoid-rpo-1.0-ckpt-135

196k • Updated Sep 16, 2025

Tanaybh/gpt2-rlhf-anthropic

Text Generation • 0.1B • Updated Oct 2, 2025 • 2

mradermacher/gpt2-rlhf-anthropic-GGUF

0.1B • Updated Sep 22, 2025 • 99

nabeelshan/rlhf-gpt2-pipeline

Text Generation • Updated Sep 24, 2025

Schrieffer/Llama-SARM-4B-PostSAEPretrain

Feature Extraction • 5B • Updated Dec 11, 2025 • 2 • 1

Tanaybh/gpt2-got-therapy

Text Generation • 0.1B • Updated Sep 30, 2025 • 5

Vibudhbh/gpt2-rlhf-implementation

Text Generation • 0.1B • Updated Oct 2, 2025 • 6

mradermacher/gpt2-rlhf-implementation-GGUF

0.1B • Updated Oct 2, 2025 • 177

mzhaoshuai/Mistral-7B-v0.1-conf-refalign

Text Generation • Updated Oct 16, 2025

geoffmunn/Qwen3-4B-SafeRL-GGUF

Text Generation • 4B • Updated Oct 18, 2025 • 15

orville-wang/Transit-R1-3B

Text Generation • 3B • Updated Oct 20, 2025 • 2 • 1

mradermacher/Transit-R1-3B-GGUF

3B • Updated Oct 21, 2025 • 17

samhitha2601/llama3.2-3b-ppo

Reinforcement Learning • Updated Oct 23, 2025

samhitha2601/llama3.2-3b-ppo-critic

Reinforcement Learning • Updated Oct 23, 2025