GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper ⢠2601.05242 ⢠Published 29 days ago ⢠221
Running on Zero Featured 1.26k Qwen3-TTS Demo š 1.26k Transform text into natural-sounding speech with custom voices
Runtime error Featured 2.96k The Smol Training Playbook š 2.96k The secrets to building world-class LLMs
h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-700bt Text Generation ⢠Updated May 24, 2023 ⢠40 ⢠4
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models Paper ⢠2401.04658 ⢠Published Jan 9, 2024 ⢠27