Qwen3-Next-80B-A3B-Instruct-qx64-mlx

Test prompt:

mlx_lm.generate --model release/Qwen3-Next-80B-A3B-Instruct-qx64-mlx --max-tokens 20000 \
--prompt "If you were to write an Agnostic Agent as a CLI for a user to interact with Ollama API, and want to use FP, what language would you prefer"

If I were to write an Agnostic Agent as a CLI to interact with the Ollama API using Functional Programming (FP) principles, I would prefer:

Haskell

✅ Why Haskell?

1. Pure Functional Programming First-Class

Haskell enforces immutability, pure functions, and explicit side-effect handling via IO and monads — perfect for modeling an agent that:

Receives user input (side effect)
Makes HTTP requests to Ollama (side effect)
Processes responses (pure logic)
Maintains state (e.g., conversation history) via pure state transformers (StateT, ReaderT)

This aligns with the “Agnostic Agent” ideal: logic is decoupled from I/O, making it testable, composable, and reusable.

2. Strong Type System + Algebraic Data Types (ADTs)

You can model the Ollama API interactions with precise types:

data Model = Model { name :: String, description :: Maybe String }

data Request = Request
  { model    :: Model
  , prompt   :: String
  , stream   :: Bool
  , options  :: Maybe Options
  }

data Response = Response
  { completion :: String
  , done       :: Bool
  , context    :: [Int]
  }

data Error = ApiError String | JsonParseError String | TimeoutError

This makes invalid states unrepresentable — a core FP strength.

3. Excellent HTTP & JSON Libraries

http-client — robust, streaming, async-capable
aeson — fast, composable JSON encoding/decoding
swagger (optional) — if you want to generate clients from OpenAPI specs

You can write clean, type-safe wrappers around Ollama’s /api/generate, /api/tags, etc.

4. CLI Tooling: `optparse-applicative`

A mature, composable, FP-style CLI parser:

main :: IO ()
main = do
  opts <- execParser optsParser
  result <- runAgent opts
  either (putStrLn . show) putStrLn result

No mutable state. All logic flows through pure functions.

5. Agnostic by Design

The core agent logic (runAgent :: AgentConfig -> IO AgentResult) is pure and testable.
I/O is isolated at the edges (CLI input, HTTP calls).
You could swap Ollama for Llama.cpp, vLLM, or OpenAI with a different Backend module — no logic changes.

6. Concurrency & Streaming

Ollama supports streaming responses (stream: true). Haskell’s lazy evaluation and conduit/pipes libraries make streaming response processing elegant and memory-safe.

7. No Runtime Overhead / Native Binaries

stack build --executable produces a single static binary — perfect for CLI tools.

🆚 Alternatives (and why they’re secondary)

Language	Pros	Cons
Rust	Fast, safe, great CLI tools (`clap`)	Not purely functional; FP possible but not enforced
Elm	Pure FP, great for UI	No CLI/HTTP support; not suited for backend/CLI
PureScript	Haskell-like, compiles to JS	Weak CLI tooling; Ollama API needs HTTP, not browser
F#	Great FP + .NET	Less mature for CLI/HTTP; ecosystem smaller than Haskell
Scala	FP + JVM	Heavy, slow startup, overkill for CLI
Clojure	Functional, Lisp	Dynamic typing, less type safety; harder to reason about API contracts

🧠 Example Flow (Haskell-style)

main :: IO ()
main = do
  input <- getLine
  let request = buildRequest "llama3" input
  response <- sendToOllama request
  putStrLn $ formatResponse response

buildRequest — pure function
sendToOllama — IO Response
formatResponse — pure function

All logic is composable. You can unit test buildRequest and formatResponse without touching the network.

✅ Conclusion

Haskell is the ideal language to build an Agnostic Agent CLI for Ollama using Functional Programming because it:

Enforces purity and immutability

Provides strong typing for API contracts

Offers excellent tooling for HTTP, JSON, and CLI

Enables true separation of concerns

Produces fast, reliable, testable binaries

It turns the “Agnostic Agent” from a buzzword into a practically elegant, mathematically sound system.

If you want to go even further: add ZIO-style effect systems via freer-simple or extensible-effects — but even vanilla Haskell is more than sufficient.

Prompt: 42 tokens, 163.629 tokens-per-sec
Generation: 1125 tokens, 54.763 tokens-per-sec
Peak memory: 53.458 GB

This model Qwen3-Next-80B-A3B-Instruct-qx64-mlx was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Instruct using mlx-lm version 0.27.1.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Next-80B-A3B-Instruct-qx64-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Downloads last month: 123

Safetensors

Model size

80B params

Tensor type

BF16

U32

Model tree for nightmedia/Qwen3-Next-80B-A3B-Instruct-qx64-mlx

Base model

Qwen/Qwen3-Next-80B-A3B-Instruct

Quantized

(49)

this model

Collections including nightmedia/Qwen3-Next-80B-A3B-Instruct-qx64-mlx