Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces
In parameter-constrained Small Language Models (SLMs), it is often difficult for the model to effectively distinguish between "task state representation" and "underlying logical constraints" within high-dimensional search spaces. Traditional fine-tuning methods frequently lead to coupling conflicts between these two in the Latent Space, which limits the model's convergence ceiling.
Arcade-3B introduces the SC-OrthFine architecture, with the core objective of achieving decoupling of state-space search: it forcibly projects the model's search behavior into mutually orthogonal State Vectors and Constraint Vectors.
1. The Coupling Dilemma in State-Space Search
In a 3B-scale model, the hidden state output carries extremely high information density. During gradient backpropagation, the traditional (Cross-Entropy Loss) adjusts weights indiscriminately to fit the target distribution. However, when handling logical reasoning (e.g., GSM8K) or code generation (e.g., HumanEval), the model must simultaneously process:
- Semantic State ( S ): Generating the contextual representation of the current token.
- Logical Constraints ( C ): Adhering to syntax, mathematical rules, or long-range structural dependencies.
When these two overlap on the same Manifold, the search behavior experiences significant interference.
2. SC-Orthogonal: Orthogonal Projection Decoupling Mechanism
To address the issues mentioned above, we designed the SC-Orthogonal Optimization Loop. Its core logic involves splitting the hidden state along the feature dimension to define two independent subspaces:
- State Projection Half (State Half, ): Focuses on the feature representation for instantaneous prediction.
- Constraint Projection Half (Constraint Half, ): Carries global logical boundaries and structural constraints.
Mathematical Definition and Loss Function
To ensure the decoupling of search behavior, we introduce an orthogonality constraint. By minimizing the inner product of and , we force them to maintain a orthogonality in a geometric sense:
To implement this constraint during the training process, we define the Orthogonality Loss function :
The final joint optimization objective function is:
By introducing the orthogonal penalty term regulated by , the model is forced to perform parameter searches within mutually independent subspaces, thereby avoiding feature collapse.
3. Experimental Analysis: Performance Gains from Decoupling
Experimental results indicate that this state-space decoupling is particularly prominent in logic-intensive tasks:
- Robustness in Logical Reasoning: In the GSM8K benchmark, Arcade-3B achieved an accuracy of 62.9%. This proves that through orthogonal constraints, the model can better isolate mathematical logical constraints from language generation states, reducing "hallucination" interference during the reasoning process.
- Coding Efficiency: In the HumanEval task, the score of 41.5% significantly leads other models of the same scale that do not employ orthogonal decoupling (such as Qwen1.5-1.8B at 27.4%), demonstrating that orthogonal subspaces offer higher search efficiency for complex structured data.
| Benchmark | Arcade-3B | Gemma-2-2B | Llama-2-7B |
|---|---|---|---|
| MMLU | 52.9% | 52.4% | 45.3% |
| GSM8K | 62.9% | 50.9% | 14.6% |
| HumanEval | 41.5% | 32.3% | 12.8% |
Conclusion
The technical path of Arcade-3B demonstrates that for small parameter models, simply increasing data volume or obtaining logits via distillation is insufficient. Through the underlying mathematical constraints of SC-OrthFine, achieving state-space search decoupling from a geometric perspective is an effective means of enhancing a model's "logical density."

