# Kimi-K2.5-PRISM

An unrestricted/unchained PRISM version of Moonshot AI's Kimi-K2.5 with over-refusal and propaganda mechanisms removed using our advanced PRISM pipeline (Projected Refusal Isolation via Subspace Modification).

☕ Support Our Work

If you enjou our work and find it useful, please consider sponsoring or supporting us!

Option	Description
PRISM VIP Membership	Access to all PRISM models
One-Time Support	Support this model

Model Highlights

PRISM Ablation — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
1T MoE Architecture — 1 trillion total parameters with 32 billion active per token across 384 experts
Native Multimodal — Pre-trained on vision-language tokens for seamless image, video, and text understanding
256K Context Window — Extended context for complex agentic tasks and large codebases
Dual Modes — Supports both Thinking (deep reasoning) and Instant (fast response) modes
Agent Swarm — Self-directed, coordinated multi-agent execution for complex tasks

Model Architecture

Specification	Value
Architecture	Mixture-of-Experts (MoE)
Total Parameters	1T
Activated Parameters	32B
Number of Layers	61
Attention Hidden Dimension	7168
Number of Attention Heads	64
Number of Experts	384
Selected Experts per Token	8
Shared Experts	1
Vocabulary Size	160K
Context Length	256K
Attention Mechanism	MLA
Activation Function	SwiGLU
Vision Encoder	MoonViT (400M)

Benchmarks

Benchmark	Kimi K2.5 (Thinking)	GPT-5.2	Claude 4.5 Opus	Gemini 3 Pro
AIME 2025	96.1	100	92.8	95.0
GPQA-Diamond	87.6	92.4	87.0	91.9
HLE-Full	30.1	34.5	30.8	37.5
HLE-Full (w/ tools)	50.2	45.5	43.2	45.8
SWE-Bench Verified	76.8	80.0	80.9	76.2
Terminal Bench 2.0	50.8	54.0	59.3	54.2
BrowseComp	60.6	65.8	37.0	37.8
MMMU-Pro	78.5	79.5	74.0	81.0
VideoMMMU	86.6	85.9	84.4	87.6

Usage

Transformers

Install dependencies:

pip install git+https://github.com/huggingface/transformers.git

Basic chat completion:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "Ex0bit/Kimi-K2.5-PRISM"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": "You are Kimi, an AI assistant."},
    {"role": "user", "content": "Hello!"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, temperature=1.0, top_p=0.95)
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(output_text)

Chat with Image

import base64
import requests

# Load image
url = "https://example.com/image.png"
image_base64 = base64.b64encode(requests.get(url).content).decode()

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in detail."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{image_base64}"},
            },
        ],
    }
]

# Use same generation code as above

vLLM

Install vLLM nightly:

pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git

Serve the model:

vllm serve Ex0bit/Kimi-K2.5-PRISM \
     --tensor-parallel-size 8 \
     --trust-remote-code \
     --served-model-name kimi-k2.5-prism

SGLang

python3 -m sglang.launch_server \
  --model-path Ex0bit/Kimi-K2.5-PRISM \
  --tp-size 8 \
  --trust-remote-code \
  --served-model-name kimi-k2.5-prism \
  --host 0.0.0.0 \
  --port 8000

Recommended Parameters

Mode	Temperature	Top-P	Max New Tokens
Thinking	1.0	0.95	96000
Instant	0.6	0.95	4096

Switching Modes

For Instant mode (faster, no reasoning), pass:

# Official API
extra_body={"thinking": {"type": "disabled"}}

# vLLM/SGLang
extra_body={"chat_template_kwargs": {"thinking": False}}

Hardware Requirements

Due to the 1T parameter size, this model requires significant hardware:

Minimum: 8x A100 80GB or equivalent
Recommended: 8x H100 80GB for optimal performance
INT4 Quantization: Available for reduced memory footprint

License

This model is released under the PRISM Research License.

Acknowledgments

Based on Kimi-K2.5 by Moonshot AI. See the technical blog for more details on the base model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support