PromptComplexityEstimator

A lightweight regressor that estimates the complexity of an LLM prompt on a scale between 0 and 1.

  • Input: a string prompt
  • Output: a scalar score in [0, 1] (higher = more complex)

The model is designed primarily to be used as a core building block for semantic routing systems, especially LLM vs. SLM (Small Language Model) routers.
Any router that aims to intelligently decide which model should handle a request needs a reliable signal for how complex the request is. This is the gap this model aims to close.


Intended use

Primary use case: LLM vs. SLM routing

This model is intended to be used as part of a semantic router, where:

  • Simple prompts are handled by a small / fast / cheap model
  • Complex prompts are routed to a large / capable / expensive model

The complexity score provides a learned signal for this decision.

Additional use cases

  • Prompt analytics and monitoring
  • Dataset stratification by difficulty
  • Adaptive compute allocation
  • Cost-aware or latency-aware inference pipelines

Not intended for

  • Safety classification, toxicity detection, or policy enforcement
  • Guaranteed difficulty estimation for a specific target model
  • Multimodal inputs or tool-augmented workflows (RAG/tools)

Usage

import torch
from transformers import AutoTokenizer, AutoModel

repo_id = "ilya-kolchinsky/PromptComplexityEstimator"

tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()

prompt = "Design a distributed consensus protocol with Byzantine fault tolerance..."
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    score = model(**inputs).logits.squeeze(-1).item()

print(float(score))

Example: Simple LLM vs. SLM routing

THRESHOLD = 0.45  # chosen empirically

def route_prompt(prompt: str) -> str:
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        complexity = model(**inputs).logits.squeeze(-1).item()

    return "LLM" if complexity > THRESHOLD else "SLM"

Model and Training Details

Datasets

Training Configuration

  • Epochs: 3
  • Batch Size: 16
  • Loss: huber
  • Regressor Learning Rate: 7.5e-5
  • Encoder Learning Rate: 1.0e-5
  • Encoder Weight Decay: 0.01
  • Optimizer: AdamW
  • Schedule: Cosine (warmup_ratio=0.06)
  • Dropout: 0.1

Model

  • Backbone encoder: microsoft/deberta-v3-base
  • Mask-aware mean pooling over token embeddings + LayerNorm
  • Regression head: Linear โ†’ ReLU โ†’ Linear โ†’ Sigmoid
  • Max input length: 512 tokens
  • The model outputs a bounded score in [0, 1]. In the examples below, the score is read from outputs.logits (shape [batch, 1]).

Full training code and configuration are available at https://github.com/ilya-kolchinsky/ComplexityEstimator.


Performance

On the held-out evaluation set used during development, the released checkpoint achieved:

  • MAE: 0.0855
  • Spearman correlation: 0.735

Citation

@misc{kolchinsky_promptcomplexityestimator_2026,
  title        = {PromptComplexityEstimator},
  author       = {Ilya Kolchinsky},
  year         = {2026},
  howpublished = {Hugging Face Hub model: ilya-kolchinsky/PromptComplexityEstimator}
}
Downloads last month
26
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ilya-kolchinsky/PromptComplexityEstimator

Finetuned
(513)
this model