PromptComplexityEstimator
A lightweight regressor that estimates the complexity of an LLM prompt on a scale between 0 and 1.
- Input: a string prompt
- Output: a scalar score in [0, 1] (higher = more complex)
The model is designed primarily to be used as a core building block for semantic routing systems, especially LLM vs. SLM (Small Language Model) routers.
Any router that aims to intelligently decide which model should handle a request needs a reliable signal for how complex the request is. This is the gap this model aims to close.
Intended use
Primary use case: LLM vs. SLM routing
This model is intended to be used as part of a semantic router, where:
- Simple prompts are handled by a small / fast / cheap model
- Complex prompts are routed to a large / capable / expensive model
The complexity score provides a learned signal for this decision.
Additional use cases
- Prompt analytics and monitoring
- Dataset stratification by difficulty
- Adaptive compute allocation
- Cost-aware or latency-aware inference pipelines
Not intended for
- Safety classification, toxicity detection, or policy enforcement
- Guaranteed difficulty estimation for a specific target model
- Multimodal inputs or tool-augmented workflows (RAG/tools)
Usage
import torch
from transformers import AutoTokenizer, AutoModel
repo_id = "ilya-kolchinsky/PromptComplexityEstimator"
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
prompt = "Design a distributed consensus protocol with Byzantine fault tolerance..."
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
score = model(**inputs).logits.squeeze(-1).item()
print(float(score))
Example: Simple LLM vs. SLM routing
THRESHOLD = 0.45 # chosen empirically
def route_prompt(prompt: str) -> str:
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
complexity = model(**inputs).logits.squeeze(-1).item()
return "LLM" if complexity > THRESHOLD else "SLM"
Model and Training Details
Datasets
- Cross-Difficulty (https://huggingface.co/datasets/BatsResearch/Cross-Difficulty)
- Easy2Hard-Bench
- MATH
- ARC
- RACE
- ANLI (R1/R2/R3)
Training Configuration
- Epochs: 3
- Batch Size: 16
- Loss: huber
- Regressor Learning Rate: 7.5e-5
- Encoder Learning Rate: 1.0e-5
- Encoder Weight Decay: 0.01
- Optimizer: AdamW
- Schedule: Cosine (warmup_ratio=0.06)
- Dropout: 0.1
Model
- Backbone encoder: microsoft/deberta-v3-base
- Mask-aware mean pooling over token embeddings + LayerNorm
- Regression head: Linear โ ReLU โ Linear โ Sigmoid
- Max input length: 512 tokens
- The model outputs a bounded score in [0, 1]. In the examples below, the score is read from
outputs.logits(shape[batch, 1]).
Full training code and configuration are available at https://github.com/ilya-kolchinsky/ComplexityEstimator.
Performance
On the held-out evaluation set used during development, the released checkpoint achieved:
- MAE: 0.0855
- Spearman correlation: 0.735
Citation
@misc{kolchinsky_promptcomplexityestimator_2026,
title = {PromptComplexityEstimator},
author = {Ilya Kolchinsky},
year = {2026},
howpublished = {Hugging Face Hub model: ilya-kolchinsky/PromptComplexityEstimator}
}
- Downloads last month
- 26
Model tree for ilya-kolchinsky/PromptComplexityEstimator
Base model
microsoft/deberta-v3-base