Content

This model area holds the public parts of converted gguf models using Skipper (T3) or Mate (M8) technology. Future modes will also follow the nautic theme.

Demo Spaces

  • Granite4 All Granite4 models (small, tiny, micro, nano 1b and nano 350m)
  • tbd: add PremiumZero, AdvancedZero, FrontierZero
  • tbd: all OSS models with Apache2.0 and MIT license
  • tbd: add larger models using advanced compression (REAP, M8, ...)

ELO (https://lmarena.ai/leaderboard/text)

Versions

Version Codename Fileprefix typical bpw range new feature
1.0 Skipper T3 and T2 0.8 .. 2.2 introduce new compression method
1.5 Mate M8 0.4 .. 2 compression improvements
2.0 Cheng Cx 0.3 .. 2 speed improvements
2.5 Cheng++ Cy 0.1 .. 2 reduce compute requirements

V1 does reduce model size significantly at same subjective quality, but leaves compute requirements high.

V2 will scale down compute requirements and support cheap NPUs

expected bpw (bit per weight)

Actual bpw are higher for small models and lower for larger models. Similar to JPEG and video encoding, higher input quality opens more opportunity for compression.

Base Mode % bpw@30b
Q5_K T3UD 95 2 .. 2.2
Q4_K T2UD 90 1.4 .. 1.6
Q2_K T2UD2 75 1 .. 1.2
Q2_K T2UD1 60 0.8
Q2_K M8HQ 75 0.8
Q2_K M8LQ 60 0.4 .. 0.6
Downloads last month
-
GGUF
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support