Geilim-1B-Instruct (εΏε»)
Deep Causal Internal Reasoning No verbose CoT, no
<think>tags, just concise answers powered by implicit reasoning.
π‘ Introduction
Recent advances in reasoning models (DeepSeek R1, o1) have demonstrated impressive capabilities through Chain-of-Thought (CoT) reasoning. However, we observe several critical drawbacks:
Problems with External CoT:
- Verbosity Tax: Models generate hundreds of tokens in
<think>tags before answering, increasing latency and cost - Autoregressive Dependency: Models must "see" their reasoning to follow it, forcing sequential token generation
- Token Inefficiency: Users pay for reasoning traces they often don't need, only the final answer matters
- Production Overhead: Verbose outputs are impractical for real-time APIs and edge deployment
Our Insight: What if reasoning could happen internally in the model's hidden states, without generating verbose traces?
Geilim-1B-Instruct addresses these limitations through a hybrid architecture combining:
- ASPP (Adjacency-Structured Parallel Propagation): Graph-based causal chains for structured reasoning
- Ο-flow (Probability Flow Dynamics): Internal refinement in probability space without token generation
- Hybrid Gating: Learnable balance between structured and attention-based processing
The result: Deep reasoning capability with concise outputs - the best of both worlds.
π― Core Value Proposition
Geilim-1B-Instruct is the anti-verbose reasoning model.
| Model Type | Reasoning Approach | Output Style |
|---|---|---|
| Baseline (Llama-3.2-1B) | Limited reasoning | Direct but may lack depth |
| CoT Models (DeepSeek R1, o1) | External reasoning chains | Verbose <think> tags, long outputs |
| Geilim-1B-Instruct | Internal reasoning | Concise answers, reasoning in hidden states |
Key Differentiator: Geilim performs deep causal reasoning internally through ASPP+Ο-flow architecture, then outputs only the final answer. You get the reasoning quality without the verbosity tax.
ποΈ Architecture Overview
Geilim-1B-Instruct combines three key components for implicit reasoning:
1. ASPP Operator (Adjacency-Structured Parallel Propagation)
- Union-Find graph structure: Linear causal chain where each token only connects to its parent
- Iterative message passing:
h_i^(t+1) = Ο(h_i^(t), h_parent[i]) - K-step evolution: Adaptive 2-8 steps of causal propagation
- Complexity: O(n) - efficient linear-time reasoning
Why it matters: ASPP creates explicit causal relationships between tokens, allowing information to flow through a reasoning chain without generating output tokens.
2. Ο-flow (Probability Flow Dynamics)
- Velocity field learning:
h' = h + Ξ± * v(h)wherev(h)is a learned refinement - Multi-step refinement: Iterates in probability space to converge on the correct answer
- Gated application: Model learns when to refine (complex questions) vs when to skip (simple questions)
- Internal convergence: Reasoning happens in hidden states, not in generated text
Why it matters: Ο-flow eliminates the need for external CoT by performing iterative refinement internally. The model "thinks" in its hidden states and outputs only the final result.
3. Hybrid Gating Mechanism
output = gate * ASPP(x) + (1-gate) * Attention(x)
- Combines structured causal reasoning (ASPP) with flexible attention
- Learnable balance between graph-based and sequence-based processing
- Applied to all 30 layers of the base model (Llama-3.2-1B)
π§ Why Ο-flow Eliminates Verbosity
The Problem with Traditional CoT
External Reasoning Models (DeepSeek R1, o1-style):
User: What is 15 * 8?
Model: <think>
Let me break this down step by step:
1. First, I'll multiply 15 by 8
2. 15 * 8 = 15 * (10 - 2)
3. Using distributive property: 15*10 - 15*2
4. 150 - 30 = 120
Therefore, the answer is 120.
</think>
The answer is 120.
- Output: 250+ characters
- Latency: High (many tokens to generate)
- Cost: Expensive (charged per token)
Geilim's Internal Reasoning
Geilim-1B-Instruct (ASPP+Ο-flow):
User: What is 15 * 8?
Model: 120
- Output: 3 characters
- Latency: Low (minimal generation)
- Cost: Minimal
- Reasoning: Happened internally through:
- ASPP causal chain propagating arithmetic relationships
- Ο-flow refining probability distribution across answer space
- Convergence to correct answer in hidden states
π¬ Technical Mechanism
How Ο-flow Achieves Internal Reasoning
Probability Space Operations
- Instead of generating tokens to explore answers, Ο-flow refines probability distributions directly
v(h): Learned velocity field that corrects the model's initial judgment- Multi-step:
h^(0) β h^(1) β h^(2)(2 refinement steps)
Convergence Without Output
- Traditional models need to "see" their reasoning to follow it (autoregressive dependency)
- Ο-flow breaks this: reasoning occurs in parallel across all positions simultaneously
- The model converges internally before generating any output token
Adaptive Complexity
pi_flow_use_gate=True: Model learns when refinement is needed- Simple questions: Direct output (gate β 0, skip refinement)
- Complex questions: Internal multi-step refinement (gate β 1, apply Ο-flow)
- User always sees concise output regardless
Synergy with ASPP
- ASPP provides causal structure (parent-child dependencies)
- Ο-flow refines along these dependencies
- Result: Structured reasoning (not just attention) + probabilistic convergence = deep causal understanding
π Configuration
Model Architecture
- Base Model: Llama-3.2-1B-Instruct (1.26B params)
- Total Parameters: ~1.4B (140M additional ASPP+Ο-flow params)
- Hybrid Layers: All 30 layers (universal reasoning capability)
ASPP Settings
aspp_hidden_dim: 512 # vs 2048 model hidden_size (reduce overfitting)
aspp_num_steps: 2-8 # learnable via sigmoid gating
aspp_dropout: 0.15
aspp_num_neighbors: 1 # Union-Find: parent-only connections
Ο-flow Settings
pi_flow: True # Enable probability flow refinement
pi_flow_steps: 2 # 2-step refinement
pi_flow_scale: 0.5 # Moderate refinement strength
pi_flow_use_gate: True # Adaptive gating
π Quick Start
Installation
pip install transformers torch
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model
model_path = "NoesisLab/Geilim-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Generate response
prompt = "A store has 120 apples. They sell 35 in the morning and 48 in the afternoon. How many are left?"
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=128,
temperature=0.7,
do_sample=True,
top_p=0.9,
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response) # Expected: "37" or "37 apples are left." (concise!)
Advanced Usage
# For math problems requiring step-by-step (if needed)
# Note: Geilim prefers concise outputs, but can show work if prompted
prompt = "Explain how you would solve: What is 15 * 23?"
# For best results with implicit reasoning
generation_config = {
"max_new_tokens": 128, # Keep low to encourage conciseness
"temperature": 0.7, # Moderate sampling
"do_sample": True,
"top_p": 0.9,
"repetition_penalty": 1.1, # Prevent loops
}
π Training Details
Dataset
- Mixed-Benchmark-Dataset (composite reasoning benchmarks)
- 25% GSM8K (math reasoning)
- 30% HellaSwag (commonsense)
- 20% ARC (science QA)
- 10% OpenHermes (high-quality responses)
- 15% Capybara (multi-turn conversations)
Training Configuration
- Framework: TRL SFTTrainer with packing
- Epochs: 2
- Batch Size: Effective 8 (per_device=2, grad_accum=4)
- Learning Rate: 2e-4 with 10% warmup
- Precision: bfloat16 with gradient checkpointing
- Optimizer: AdamW (weight_decay=0.1, max_grad_norm=1.0)
Training Philosophy
Unlike CoT models trained on verbose reasoning chains, Geilim is trained on answer-focused data where:
- Correct answers are rewarded
- Reasoning quality is learned implicitly through ASPP+Ο-flow gradients
- The model learns to converge internally rather than generate external reasoning
π Evaluation
Reasoning Quality Tests
Geilim is evaluated on:
- Math reasoning (GSM8K-style arithmetic)
- Commonsense reasoning (HellaSwag, PIQA)
- Logic puzzles (multi-hop deduction)
- Reading comprehension (information tracking)
- Causal reasoning (cause-effect relationships)
Key Metrics
- Answer correctness (primary goal)
- Response conciseness (< 150 chars = concise)
- Reasoning traces (should be absent from output, present in hidden states)
π― Use Cases
Ideal For:
- Production APIs: Low latency, low token cost
- Real-time applications: Minimal generation overhead
- Cost-sensitive deployments: Pay only for the answer, not the reasoning
- User-facing chat: Clean outputs without technical reasoning traces
- Mobile/edge devices: Smaller token budgets
Not Ideal For:
- Educational use cases: When you want to show reasoning steps to users
- Debugging/verification: When explicit reasoning helps validate answers
- Research: When analyzing reasoning chains is the goal
π Comparison Table
| Feature | Geilim-1B-Instruct | DeepSeek R1 | Llama-3.2-1B |
|---|---|---|---|
| Model Size | 1.4B | 1.5B | 1.26B |
| Reasoning Type | Internal (ASPP+Ο-flow) | External (CoT) | Limited |
| Output Style | Concise answers | Verbose <think> tags |
Direct answers |
| Latency | Low | High (many tokens) | Low |
| Cost per query | Low | High | Low |
| Reasoning depth | Deep (hidden states) | Deep (explicit) | Shallow |
| Token efficiency | High | Low | Medium |
π Technical References
Core Papers & Concepts
- Union-Find Data Structure: Parent-only connections for efficient causal propagation
- Probability Flow ODEs: Continuous refinement in probability space (inspired by diffusion models)
- Hybrid Architectures: Combining structured (graph) and unstructured (attention) reasoning
Related Work
- DeepSeek R1: External reasoning chains
- o1 series: Long-form CoT reasoning
- SmolLM2: Efficient small language models
- Graph Neural Networks: Structured message passing
π§ Development
Custom Model Registration
- Model type:
asterisk(registered with HuggingFace AutoModel) - Config class:
AsteriskConfig(extends LlamaConfig) - Model class:
AsteriskForCausalLM(extends LlamaForCausalLM) - Loading: Requires
trust_remote_code=True
Training Your Own
# Install dependencies
pip install -r requirements.txt
# Train Geilim-1B-Instruct
python train_geilim.py
π Key Takeaways
- No verbose CoT: Geilim performs reasoning internally, outputs concisely
- ASPP+Ο-flow: Causal graph structure + probability flow refinement
- Deep causal understanding: Reasoning happens in hidden states, not generated text
- Production-ready: Low latency, low cost, clean outputs
- Same reasoning depth: Matches CoT models without the verbosity
π Citation
If you use Geilim-1B-Instruct in your research or applications, please cite:
@misc{geilim2026,
title={Geilim-1B-Instruct: Deep Causal Internal Reasoning via ASPP and Probability Flow},
author={NoesisLab},
year={2026},
howpublished={HuggingFace Model Hub},
url={https://huggingface.co/NoesisLab/Geilim-1B-Instruct}
}
π€ Acknowledgments
- Base Model: Llama-3.2-1B-Instruct by Meta
- Training Framework: TRL by HuggingFace
- Inspiration: DeepSeek R1 (for demonstrating value of reasoning), but pursuing conciseness
π License
Llama 3.2 Community License
π Links
Built with β€οΈ for the era of efficient reasoning models.
Geilim (εΏε») - Cantonese for "cream" - smooth, concise, and rich in substance.
Model tree for NoesisLab/Geilim-1B-Instruct
Base model
meta-llama/Llama-3.2-1B-Instruct