Geilim-1B-Instruct (εΏŒε»‰)

Deep Causal Internal Reasoning No verbose CoT, no <think> tags, just concise answers powered by implicit reasoning.


πŸ’‘ Introduction

Recent advances in reasoning models (DeepSeek R1, o1) have demonstrated impressive capabilities through Chain-of-Thought (CoT) reasoning. However, we observe several critical drawbacks:

Problems with External CoT:

  1. Verbosity Tax: Models generate hundreds of tokens in <think> tags before answering, increasing latency and cost
  2. Autoregressive Dependency: Models must "see" their reasoning to follow it, forcing sequential token generation
  3. Token Inefficiency: Users pay for reasoning traces they often don't need, only the final answer matters
  4. Production Overhead: Verbose outputs are impractical for real-time APIs and edge deployment

Our Insight: What if reasoning could happen internally in the model's hidden states, without generating verbose traces?

Geilim-1B-Instruct addresses these limitations through a hybrid architecture combining:

  • ASPP (Adjacency-Structured Parallel Propagation): Graph-based causal chains for structured reasoning
  • Ο€-flow (Probability Flow Dynamics): Internal refinement in probability space without token generation
  • Hybrid Gating: Learnable balance between structured and attention-based processing

The result: Deep reasoning capability with concise outputs - the best of both worlds.


🎯 Core Value Proposition

Geilim-1B-Instruct is the anti-verbose reasoning model.

Model Type Reasoning Approach Output Style
Baseline (Llama-3.2-1B) Limited reasoning Direct but may lack depth
CoT Models (DeepSeek R1, o1) External reasoning chains Verbose <think> tags, long outputs
Geilim-1B-Instruct Internal reasoning Concise answers, reasoning in hidden states

Key Differentiator: Geilim performs deep causal reasoning internally through ASPP+Ο€-flow architecture, then outputs only the final answer. You get the reasoning quality without the verbosity tax.


πŸ—οΈ Architecture Overview

Geilim-1B-Instruct combines three key components for implicit reasoning:

1. ASPP Operator (Adjacency-Structured Parallel Propagation)

  • Union-Find graph structure: Linear causal chain where each token only connects to its parent
  • Iterative message passing: h_i^(t+1) = Ο†(h_i^(t), h_parent[i])
  • K-step evolution: Adaptive 2-8 steps of causal propagation
  • Complexity: O(n) - efficient linear-time reasoning

Why it matters: ASPP creates explicit causal relationships between tokens, allowing information to flow through a reasoning chain without generating output tokens.

2. Ο€-flow (Probability Flow Dynamics)

  • Velocity field learning: h' = h + Ξ± * v(h) where v(h) is a learned refinement
  • Multi-step refinement: Iterates in probability space to converge on the correct answer
  • Gated application: Model learns when to refine (complex questions) vs when to skip (simple questions)
  • Internal convergence: Reasoning happens in hidden states, not in generated text

Why it matters: Ο€-flow eliminates the need for external CoT by performing iterative refinement internally. The model "thinks" in its hidden states and outputs only the final result.

3. Hybrid Gating Mechanism

output = gate * ASPP(x) + (1-gate) * Attention(x)
  • Combines structured causal reasoning (ASPP) with flexible attention
  • Learnable balance between graph-based and sequence-based processing
  • Applied to all 30 layers of the base model (Llama-3.2-1B)

🧠 Why Ο€-flow Eliminates Verbosity

The Problem with Traditional CoT

External Reasoning Models (DeepSeek R1, o1-style):

User: What is 15 * 8?

Model: <think>
Let me break this down step by step:
1. First, I'll multiply 15 by 8
2. 15 * 8 = 15 * (10 - 2)
3. Using distributive property: 15*10 - 15*2
4. 150 - 30 = 120
Therefore, the answer is 120.
</think>

The answer is 120.
  • Output: 250+ characters
  • Latency: High (many tokens to generate)
  • Cost: Expensive (charged per token)

Geilim's Internal Reasoning

Geilim-1B-Instruct (ASPP+Ο€-flow):

User: What is 15 * 8?

Model: 120
  • Output: 3 characters
  • Latency: Low (minimal generation)
  • Cost: Minimal
  • Reasoning: Happened internally through:
    1. ASPP causal chain propagating arithmetic relationships
    2. Ο€-flow refining probability distribution across answer space
    3. Convergence to correct answer in hidden states

πŸ”¬ Technical Mechanism

How Ο€-flow Achieves Internal Reasoning

  1. Probability Space Operations

    • Instead of generating tokens to explore answers, Ο€-flow refines probability distributions directly
    • v(h): Learned velocity field that corrects the model's initial judgment
    • Multi-step: h^(0) β†’ h^(1) β†’ h^(2) (2 refinement steps)
  2. Convergence Without Output

    • Traditional models need to "see" their reasoning to follow it (autoregressive dependency)
    • Ο€-flow breaks this: reasoning occurs in parallel across all positions simultaneously
    • The model converges internally before generating any output token
  3. Adaptive Complexity

    • pi_flow_use_gate=True: Model learns when refinement is needed
    • Simple questions: Direct output (gate β‰ˆ 0, skip refinement)
    • Complex questions: Internal multi-step refinement (gate β‰ˆ 1, apply Ο€-flow)
    • User always sees concise output regardless
  4. Synergy with ASPP

    • ASPP provides causal structure (parent-child dependencies)
    • Ο€-flow refines along these dependencies
    • Result: Structured reasoning (not just attention) + probabilistic convergence = deep causal understanding

πŸ“Š Configuration

Model Architecture

  • Base Model: Llama-3.2-1B-Instruct (1.26B params)
  • Total Parameters: ~1.4B (140M additional ASPP+Ο€-flow params)
  • Hybrid Layers: All 30 layers (universal reasoning capability)

ASPP Settings

aspp_hidden_dim: 512         # vs 2048 model hidden_size (reduce overfitting)
aspp_num_steps: 2-8          # learnable via sigmoid gating
aspp_dropout: 0.15
aspp_num_neighbors: 1        # Union-Find: parent-only connections

Ο€-flow Settings

pi_flow: True                # Enable probability flow refinement
pi_flow_steps: 2             # 2-step refinement
pi_flow_scale: 0.5           # Moderate refinement strength
pi_flow_use_gate: True       # Adaptive gating

πŸš€ Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_path = "NoesisLab/Geilim-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Generate response
prompt = "A store has 120 apples. They sell 35 in the morning and 48 in the afternoon. How many are left?"
messages = [{"role": "user", "content": prompt}]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)  # Expected: "37" or "37 apples are left." (concise!)

Advanced Usage

# For math problems requiring step-by-step (if needed)
# Note: Geilim prefers concise outputs, but can show work if prompted
prompt = "Explain how you would solve: What is 15 * 23?"

# For best results with implicit reasoning
generation_config = {
    "max_new_tokens": 128,        # Keep low to encourage conciseness
    "temperature": 0.7,           # Moderate sampling
    "do_sample": True,
    "top_p": 0.9,
    "repetition_penalty": 1.1,    # Prevent loops
}

πŸŽ“ Training Details

Dataset

  • Mixed-Benchmark-Dataset (composite reasoning benchmarks)
    • 25% GSM8K (math reasoning)
    • 30% HellaSwag (commonsense)
    • 20% ARC (science QA)
    • 10% OpenHermes (high-quality responses)
    • 15% Capybara (multi-turn conversations)

Training Configuration

  • Framework: TRL SFTTrainer with packing
  • Epochs: 2
  • Batch Size: Effective 8 (per_device=2, grad_accum=4)
  • Learning Rate: 2e-4 with 10% warmup
  • Precision: bfloat16 with gradient checkpointing
  • Optimizer: AdamW (weight_decay=0.1, max_grad_norm=1.0)

Training Philosophy

Unlike CoT models trained on verbose reasoning chains, Geilim is trained on answer-focused data where:

  • Correct answers are rewarded
  • Reasoning quality is learned implicitly through ASPP+Ο€-flow gradients
  • The model learns to converge internally rather than generate external reasoning

πŸ“ˆ Evaluation

Reasoning Quality Tests

Geilim is evaluated on:

  1. Math reasoning (GSM8K-style arithmetic)
  2. Commonsense reasoning (HellaSwag, PIQA)
  3. Logic puzzles (multi-hop deduction)
  4. Reading comprehension (information tracking)
  5. Causal reasoning (cause-effect relationships)

Key Metrics

  • Answer correctness (primary goal)
  • Response conciseness (< 150 chars = concise)
  • Reasoning traces (should be absent from output, present in hidden states)

🎯 Use Cases

Ideal For:

  • Production APIs: Low latency, low token cost
  • Real-time applications: Minimal generation overhead
  • Cost-sensitive deployments: Pay only for the answer, not the reasoning
  • User-facing chat: Clean outputs without technical reasoning traces
  • Mobile/edge devices: Smaller token budgets

Not Ideal For:

  • Educational use cases: When you want to show reasoning steps to users
  • Debugging/verification: When explicit reasoning helps validate answers
  • Research: When analyzing reasoning chains is the goal

πŸ†š Comparison Table

Feature Geilim-1B-Instruct DeepSeek R1 Llama-3.2-1B
Model Size 1.4B 1.5B 1.26B
Reasoning Type Internal (ASPP+Ο€-flow) External (CoT) Limited
Output Style Concise answers Verbose <think> tags Direct answers
Latency Low High (many tokens) Low
Cost per query Low High Low
Reasoning depth Deep (hidden states) Deep (explicit) Shallow
Token efficiency High Low Medium

πŸ“š Technical References

Core Papers & Concepts

  • Union-Find Data Structure: Parent-only connections for efficient causal propagation
  • Probability Flow ODEs: Continuous refinement in probability space (inspired by diffusion models)
  • Hybrid Architectures: Combining structured (graph) and unstructured (attention) reasoning

Related Work

  • DeepSeek R1: External reasoning chains
  • o1 series: Long-form CoT reasoning
  • SmolLM2: Efficient small language models
  • Graph Neural Networks: Structured message passing

πŸ”§ Development

Custom Model Registration

  • Model type: asterisk (registered with HuggingFace AutoModel)
  • Config class: AsteriskConfig (extends LlamaConfig)
  • Model class: AsteriskForCausalLM (extends LlamaForCausalLM)
  • Loading: Requires trust_remote_code=True

Training Your Own

# Install dependencies
pip install -r requirements.txt

# Train Geilim-1B-Instruct
python train_geilim.py

🌟 Key Takeaways

  1. No verbose CoT: Geilim performs reasoning internally, outputs concisely
  2. ASPP+Ο€-flow: Causal graph structure + probability flow refinement
  3. Deep causal understanding: Reasoning happens in hidden states, not generated text
  4. Production-ready: Low latency, low cost, clean outputs
  5. Same reasoning depth: Matches CoT models without the verbosity

πŸ“ Citation

If you use Geilim-1B-Instruct in your research or applications, please cite:

@misc{geilim2026,
  title={Geilim-1B-Instruct: Deep Causal Internal Reasoning via ASPP and Probability Flow},
  author={NoesisLab},
  year={2026},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/NoesisLab/Geilim-1B-Instruct}
}

🀝 Acknowledgments

  • Base Model: Llama-3.2-1B-Instruct by Meta
  • Training Framework: TRL by HuggingFace
  • Inspiration: DeepSeek R1 (for demonstrating value of reasoning), but pursuing conciseness

πŸ“„ License

Llama 3.2 Community License


πŸ”— Links


Built with ❀️ for the era of efficient reasoning models.

Geilim (εΏŒε»‰) - Cantonese for "cream" - smooth, concise, and rich in substance.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for NoesisLab/Geilim-1B-Instruct

Finetuned
(1255)
this model

Datasets used to train NoesisLab/Geilim-1B-Instruct

Collection including NoesisLab/Geilim-1B-Instruct