beyoru/Luna
Text Generation
•
4B
•
Updated
•
36
•
11
RP model trained with GRPO
Note Lora grpo
Note Lora grpo wirh reasoning
Note Model merging of all RP models and base model ft evolution merging
Note MoE version, finetuned