-
-
-
-
-
-
Inference Providers
Active filters:
rlhf
Text Generation
•
Updated
•
2
•
1
Uppaal/gpt2-ProFS-toxicity
Text Generation
•
0.4B
•
Updated
•
13
Uppaal/gpt-j-ProFS-toxicity
Text Generation
•
6B
•
Updated
•
6
Uppaal/opt-ProFS-toxicity
Text Generation
•
7B
•
Updated
•
5
Uppaal/Mistral-ProFS-toxicity
Text Generation
•
7B
•
Updated
•
17
Uppaal/Mistral-sft-ProFS-toxicity
Text Generation
•
7B
•
Updated
•
10
Uppaal/Mistral-ProFS-safety
Text Generation
•
7B
•
Updated
•
5
Uppaal/Mistral-sft-ProFS-safety
Text Generation
•
7B
•
Updated
•
6
sodeniZz/llm-course-hw2-dpo
Text Generation
•
0.1B
•
Updated
•
1
sodeniZz/llm-course-hw2-reward-model
Text Classification
•
0.1B
•
Updated
sodeniZz/llm-course-hw2-ppo
Text Generation
•
0.1B
•
Updated
•
1
ahczhg/qwen3-0.6b-rlhf-cot
Text Generation
•
Updated
•
1
ahczhg/Llama-3.2-1B-Aegis-SFT-DPO
Text Generation
•
1B
•
Updated
•
42
•
1
mradermacher/Llama-3.2-1B-Aegis-SFT-DPO-GGUF
1B
•
Updated
•
62
4B
•
Updated
•
8
mradermacher/HistoryGPT-GGUF
4B
•
Updated
•
31
Text Generation
•
Updated
Updated
•
10
•
1
FutureMa/Qwen2.5-7B-Instruct-GRPO-Math
Text Generation
•
Updated
Text Generation
•
Updated
•
27
MaleekNoob/qwen3-0.6b-grpo-v1
Updated
AhmedSSoliman/medgemma-4b-digital-twin-v1
Updated
AhmedSSoliman/gpt-oss-20b-digital-twin-v1
Text Generation
•
Updated
•
1
AhmedSSoliman/octomed-7b-digital-twin-v1
Text Generation
•
Updated
•
1
Reinforcement Learning
•
0.6B
•
Updated
•
8
•
2
Reinforcement Learning
•
0.6B
•
Updated
•
16
mradermacher/Qwen3-0.6B-ReMax-GGUF
Reinforcement Learning
•
0.6B
•
Updated
•
9
gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter
Text Generation
•
Updated
•
6
•
1
Text Classification
•
4B
•
Updated
•
30
•
4
Text Classification
•
8B
•
Updated
•
33
•
3