RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Abstract
RedOne 2.0, a social networking service-oriented LLM, uses a progressive, RL-prioritized post-training paradigm to achieve rapid and stable adaptation, delivering improvements over larger baselines with less data.
As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifting norms and slang, and multilingual, culturally diverse corpora that induce sharp distribution shift. Supervised fine-tuning (SFT) can specialize models but often triggers a ``seesaw'' between in-distribution gains and out-of-distribution robustness, especially for smaller models. To address these challenges, we introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation. The pipeline consist in three stages: (1) Exploratory Learning on curated SNS corpora to establish initial alignment and identify systematic weaknesses; (2) Targeted Fine-Tuning that selectively applies SFT to the diagnosed gaps while mixing a small fraction of general data to mitigate forgetting; and (3) Refinement Learning that re-applies RL with SNS-centric signals to consolidate improvements and harmonize trade-offs across tasks. Across various tasks spanning three categories, our 4B scale model delivers an average improvements about 2.41 over the 7B sub-optimal baseline. Additionally, RedOne 2.0 achieves average performance lift about 8.74 from the base model with less than half the data required by SFT-centric method RedOne, evidencing superior data efficiency and stability at compact scales. Overall, RedOne 2.0 establishes a competitive, cost-effective baseline for domain-specific LLMs in SNS scenario, advancing capability without sacrificing robustness.
Community
An SNS-domain large language model primarily driven by reinforcement learning (RL), trained through three stages—Exploratory Learning (RL), Targeted Fine-Tuning (SFT), and Refinement Learning (RL)—which enhances both domain-specific and general capabilities.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes (2025)
- Aligning LLMs for Multilingual Consistency in Enterprise Applications (2025)
- PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning (2025)
- You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs (2025)
- Aligning Large Language Models via Fully Self-Synthetic Data (2025)
- Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only (2025)
- PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper