UIGEN-FX-Agentic-32B-qx86-hi-mlx

🧠 Cognitive Metrics of UIGEN-FX-Agentic-32B-qx86-hi

Task	        Score
arc_challenge	0.460
arc_easy	    0.527
boolq	        0.681
hellaswag	    0.721
openbookqa	    0.404
piqa	        0.790
winogrande	    0.728

🔍 Key Observations for this 32B Model

Strengths:

  • Dominates in real-world commonsense tasks: piqa (physical reasoning) and winogrande (social context/prompt resolution).
  • Outperforms all other models in these tasks by margins of +0.044 (piqa) and +0.056 (winogrande) compared to the next-best.

Weaknesses:

  • Struggles with structured academic reasoning (arc_challenge: 0.460, arc_easy: 0.527) — below average for this dataset.
  • Lowest in boolq (0.681) among all models, suggesting less optimal yes/no QA capabilities.

💡 Why this matters:

This model was explicitly designed for UI/web development tasks (not general cognitive benchmarks).

Its performance profile reveals:

  • It excels at unstructured, real-world reasoning (e.g., "which object is heavier?"), but
  • Lacks optimization for standardized academic tests (e.g., ARC, OpenBookQA).

This confirms its specialization — not a general-purpose AI.

Cognitive Metrics of All Models (7 Tasks)

Model	                        arc_challenge arc_easy	boolq hellaswag	openbookqa piqa winogrande
Jan-v1-2509-qx86-hi	                    0.435	0.540	0.729	0.588	0.388	0.730	0.633
Qwen3-8B-DND-Almost-Human-B-e32	        0.464	0.569	0.737	0.632	0.406	0.744	0.634
Qwen3-Jan-v1-256k-ctx-6B-Brainstorm20x	0.445	0.579	0.696	0.600	0.404	0.732	0.627
Qwen3-Neo-Experimental-6B	            0.463	0.579	0.721	0.623	0.406	0.738	0.672
Qwen3-ST-Deep-Space-Nine-v3	            0.442	0.568	0.743	0.637	0.384	0.732	0.624
Qwen3-ST-The-Next-Generation-II-E32	    0.452	0.581	0.721	0.650	0.406	0.746	0.646
UIGEN-FX-Agentic-32B	                0.460	0.527	0.681	0.721	0.404	0.790	0.728
WEBGEN-4B-Preview	                    0.503	0.694	0.849	0.583	0.426	0.732	0.593

✅ Key Highlights

  • arc_challenge: WEBGEN-4B leads (0.503)
  • arc_easy: WEBGEN-4B dominates (0.694), 2x higher than next
  • boolq: WEBGEN-4B kills it (0.849) – 15% above second-place Qwen3-ST-Deep-Space-Nine
  • piqa: UIGEN-FX wins (0.790) – 5% better than Qwen3-Next-Generation-II
  • winogrande: UIGEN-FX highest (0.728) – 6% better than Qwen3-Neo
  • openbookqa: 5 models tied at 0.406; WEBGEN-4B is only one above (0.426)

💡 Note: UIGEN-FX (32B) and WEBGEN-4B (4B) are specialized models – not designed for general cognitive benchmarking. The rest (Qwen3 variants) are the only "true" competitors for general-purpose reasoning tasks.

Reviewed with Brain

This model UIGEN-FX-Agentic-32B-qx86-hi-mlx was converted to MLX format from Tesslate/UIGEN-FX-Agentic-32B using mlx-lm version 0.28.2.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("UIGEN-FX-Agentic-32B-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
12
Safetensors
Model size
33B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/UIGEN-FX-Agentic-32B-qx86-hi-mlx

Base model

Qwen/Qwen3-32B
Quantized
(4)
this model
Quantizations
1 model

Collections including nightmedia/UIGEN-FX-Agentic-32B-qx86-hi-mlx