Qwen3-Next-80B-A3B-Instruct-qx64-mlx

Test prompt:

mlx_lm.generate --model release/Qwen3-Next-80B-A3B-Instruct-qx64-mlx --max-tokens 20000 \
--prompt "If you were to write an Agnostic Agent as a CLI for a user to interact with Ollama API, and want to use FP, what language would you prefer"

If I were to write an Agnostic Agent as a CLI to interact with the Ollama API using Functional Programming (FP) principles, I would prefer:

Haskell


✅ Why Haskell?

1. Pure Functional Programming First-Class

Haskell enforces immutability, pure functions, and explicit side-effect handling via IO and monads — perfect for modeling an agent that:

  • Receives user input (side effect)
  • Makes HTTP requests to Ollama (side effect)
  • Processes responses (pure logic)
  • Maintains state (e.g., conversation history) via pure state transformers (StateT, ReaderT)

This aligns with the “Agnostic Agent” ideal: logic is decoupled from I/O, making it testable, composable, and reusable.

2. Strong Type System + Algebraic Data Types (ADTs)

You can model the Ollama API interactions with precise types:

data Model = Model { name :: String, description :: Maybe String }

data Request = Request
  { model    :: Model
  , prompt   :: String
  , stream   :: Bool
  , options  :: Maybe Options
  }

data Response = Response
  { completion :: String
  , done       :: Bool
  , context    :: [Int]
  }

data Error = ApiError String | JsonParseError String | TimeoutError

This makes invalid states unrepresentable — a core FP strength.

3. Excellent HTTP & JSON Libraries

  • http-client — robust, streaming, async-capable
  • aeson — fast, composable JSON encoding/decoding
  • swagger (optional) — if you want to generate clients from OpenAPI specs

You can write clean, type-safe wrappers around Ollama’s /api/generate, /api/tags, etc.

4. CLI Tooling: optparse-applicative

A mature, composable, FP-style CLI parser:

main :: IO ()
main = do
  opts <- execParser optsParser
  result <- runAgent opts
  either (putStrLn . show) putStrLn result

No mutable state. All logic flows through pure functions.

5. Agnostic by Design

  • The core agent logic (runAgent :: AgentConfig -> IO AgentResult) is pure and testable.
  • I/O is isolated at the edges (CLI input, HTTP calls).
  • You could swap Ollama for Llama.cpp, vLLM, or OpenAI with a different Backend module — no logic changes.

6. Concurrency & Streaming

Ollama supports streaming responses (stream: true). Haskell’s lazy evaluation and conduit/pipes libraries make streaming response processing elegant and memory-safe.

7. No Runtime Overhead / Native Binaries

stack build --executable produces a single static binary — perfect for CLI tools.


🆚 Alternatives (and why they’re secondary)

Language Pros Cons
Rust Fast, safe, great CLI tools (clap) Not purely functional; FP possible but not enforced
Elm Pure FP, great for UI No CLI/HTTP support; not suited for backend/CLI
PureScript Haskell-like, compiles to JS Weak CLI tooling; Ollama API needs HTTP, not browser
F# Great FP + .NET Less mature for CLI/HTTP; ecosystem smaller than Haskell
Scala FP + JVM Heavy, slow startup, overkill for CLI
Clojure Functional, Lisp Dynamic typing, less type safety; harder to reason about API contracts

🧠 Example Flow (Haskell-style)

main :: IO ()
main = do
  input <- getLine
  let request = buildRequest "llama3" input
  response <- sendToOllama request
  putStrLn $ formatResponse response
  • buildRequest — pure function
  • sendToOllamaIO Response
  • formatResponse — pure function

All logic is composable. You can unit test buildRequest and formatResponse without touching the network.


✅ Conclusion

Haskell is the ideal language to build an Agnostic Agent CLI for Ollama using Functional Programming because it:

  • Enforces purity and immutability
  • Provides strong typing for API contracts
  • Offers excellent tooling for HTTP, JSON, and CLI
  • Enables true separation of concerns
  • Produces fast, reliable, testable binaries

It turns the “Agnostic Agent” from a buzzword into a practically elegant, mathematically sound system.

If you want to go even further: add ZIO-style effect systems via freer-simple or extensible-effects — but even vanilla Haskell is more than sufficient.

Prompt: 42 tokens, 163.629 tokens-per-sec
Generation: 1125 tokens, 54.763 tokens-per-sec
Peak memory: 53.458 GB

This model Qwen3-Next-80B-A3B-Instruct-qx64-mlx was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Instruct using mlx-lm version 0.27.1.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Next-80B-A3B-Instruct-qx64-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
123
Safetensors
Model size
80B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Qwen3-Next-80B-A3B-Instruct-qx64-mlx

Quantized
(49)
this model

Collections including nightmedia/Qwen3-Next-80B-A3B-Instruct-qx64-mlx