Qwen3-Next-80B-A3B-Instruct-qx64-mlx
Test prompt:
mlx_lm.generate --model release/Qwen3-Next-80B-A3B-Instruct-qx64-mlx --max-tokens 20000 \
--prompt "If you were to write an Agnostic Agent as a CLI for a user to interact with Ollama API, and want to use FP, what language would you prefer"
If I were to write an Agnostic Agent as a CLI to interact with the Ollama API using Functional Programming (FP) principles, I would prefer:
Haskell
✅ Why Haskell?
1. Pure Functional Programming First-Class
Haskell enforces immutability, pure functions, and explicit side-effect handling via IO and monads — perfect for modeling an agent that:
- Receives user input (side effect)
- Makes HTTP requests to Ollama (side effect)
- Processes responses (pure logic)
- Maintains state (e.g., conversation history) via pure state transformers (
StateT,ReaderT)
This aligns with the “Agnostic Agent” ideal: logic is decoupled from I/O, making it testable, composable, and reusable.
2. Strong Type System + Algebraic Data Types (ADTs)
You can model the Ollama API interactions with precise types:
data Model = Model { name :: String, description :: Maybe String }
data Request = Request
{ model :: Model
, prompt :: String
, stream :: Bool
, options :: Maybe Options
}
data Response = Response
{ completion :: String
, done :: Bool
, context :: [Int]
}
data Error = ApiError String | JsonParseError String | TimeoutError
This makes invalid states unrepresentable — a core FP strength.
3. Excellent HTTP & JSON Libraries
http-client— robust, streaming, async-capableaeson— fast, composable JSON encoding/decodingswagger(optional) — if you want to generate clients from OpenAPI specs
You can write clean, type-safe wrappers around Ollama’s /api/generate, /api/tags, etc.
4. CLI Tooling: optparse-applicative
A mature, composable, FP-style CLI parser:
main :: IO ()
main = do
opts <- execParser optsParser
result <- runAgent opts
either (putStrLn . show) putStrLn result
No mutable state. All logic flows through pure functions.
5. Agnostic by Design
- The core agent logic (
runAgent :: AgentConfig -> IO AgentResult) is pure and testable. - I/O is isolated at the edges (CLI input, HTTP calls).
- You could swap Ollama for Llama.cpp, vLLM, or OpenAI with a different
Backendmodule — no logic changes.
6. Concurrency & Streaming
Ollama supports streaming responses (stream: true). Haskell’s lazy evaluation and conduit/pipes libraries make streaming response processing elegant and memory-safe.
7. No Runtime Overhead / Native Binaries
stack build --executable produces a single static binary — perfect for CLI tools.
🆚 Alternatives (and why they’re secondary)
| Language | Pros | Cons |
|---|---|---|
| Rust | Fast, safe, great CLI tools (clap) |
Not purely functional; FP possible but not enforced |
| Elm | Pure FP, great for UI | No CLI/HTTP support; not suited for backend/CLI |
| PureScript | Haskell-like, compiles to JS | Weak CLI tooling; Ollama API needs HTTP, not browser |
| F# | Great FP + .NET | Less mature for CLI/HTTP; ecosystem smaller than Haskell |
| Scala | FP + JVM | Heavy, slow startup, overkill for CLI |
| Clojure | Functional, Lisp | Dynamic typing, less type safety; harder to reason about API contracts |
🧠 Example Flow (Haskell-style)
main :: IO ()
main = do
input <- getLine
let request = buildRequest "llama3" input
response <- sendToOllama request
putStrLn $ formatResponse response
buildRequest— pure functionsendToOllama—IO ResponseformatResponse— pure function
All logic is composable. You can unit test buildRequest and formatResponse without touching the network.
✅ Conclusion
Haskell is the ideal language to build an Agnostic Agent CLI for Ollama using Functional Programming because it:
- Enforces purity and immutability
- Provides strong typing for API contracts
- Offers excellent tooling for HTTP, JSON, and CLI
- Enables true separation of concerns
- Produces fast, reliable, testable binaries
It turns the “Agnostic Agent” from a buzzword into a practically elegant, mathematically sound system.
If you want to go even further: add ZIO-style effect systems via freer-simple or extensible-effects — but even vanilla Haskell is more than sufficient.
Prompt: 42 tokens, 163.629 tokens-per-sec
Generation: 1125 tokens, 54.763 tokens-per-sec
Peak memory: 53.458 GB
This model Qwen3-Next-80B-A3B-Instruct-qx64-mlx was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Instruct using mlx-lm version 0.27.1.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-Next-80B-A3B-Instruct-qx64-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 123
Model tree for nightmedia/Qwen3-Next-80B-A3B-Instruct-qx64-mlx
Base model
Qwen/Qwen3-Next-80B-A3B-Instruct