To run this model check out OpenArc.
Muse-12B-int4_asym-ov
This model was converted to OpenVINO IR using weight only compression to int4_asym.
Muse-12B has been exceptional so far.
It doesn't shy away from, um, intense topics and refusals aren't a problem. At this time we don't have access to all the samplers reccomended by LatitudeGames but I haven't seen massive degredation. Long context performance remains strong, and with some scaffolding could be a reliable workhorse, though sometimes a bit verbose.
Another interesting usecase has been tinkering inside the talk_to_llm.py. Using this demo hooks up Muse-12B with whisper and kokoro using the OpenArc server.
Very interesting way to expereince a text adventure.
Performance on A770
Results were captured using openarc bench.
Very nice.
openarc bench selects input tokens by sampling the entire vocabulary using a similar approach to llama-bench.
input tokens: [512]
max tokens: [128]
runs: 5
benching... (5/5)
Muse-12B-int4_asym-ov
βββββββ³ββββββ³ββββββ³ββββββββββ³βββββββββββ³βββββββββββββββ³ββββββββββββββ³ββββββββββββββ
β run β p β n β ttft(s) β tpot(ms) β prefill(t/s) β decode(t/s) β duration(s) β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β 1 β 512 β 128 β 0.20 β 28.92 β 2623.5 β 34.6 β 3.87 β
β 2 β 512 β 128 β 0.16 β 28.92 β 3134.4 β 34.6 β 3.84 β
β 3 β 512 β 128 β 0.17 β 28.89 β 3007.7 β 34.6 β 3.84 β
β 4 β 512 β 128 β 0.17 β 28.88 β 3045.6 β 34.6 β 3.84 β
β 5 β 512 β 128 β 0.17 β 28.91 β 2998.3 β 34.6 β 3.84 β
βββββββ΄ββββββ΄ββββββ΄ββββββββββ΄βββββββββββ΄βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
Total: 5 runs
System:
Xeon W2255
128GB DDR4-ECC
Asrock A770
Ubuntu 24.04: 6.14.4-061404-generic
openvino 2025.3.0
openvino-genai 2025.3.0.0
- Downloads last month
- 10
Model tree for Echo9Zulu/Muse-12B-int4_asym-ov
Base model
mistralai/Mistral-Nemo-Base-2407