---
library_name: mlx
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
tags:
- mlx
base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
---

# Qwen3-Next-80B-A3B-Instruct-qx64-mlx

Test prompt:

```bash
mlx_lm.generate --model release/Qwen3-Next-80B-A3B-Instruct-qx64-mlx --max-tokens 20000 \
--prompt "If you were to write an Agnostic Agent as a CLI for a user to interact with Ollama API, and want to use FP, what language would you prefer"
```
If I were to write an **Agnostic Agent** as a CLI to interact with the **Ollama API** using **Functional Programming (FP)** principles, I would prefer:

> **Haskell**

---

### ✅ Why Haskell?

#### 1. **Pure Functional Programming First-Class**
Haskell enforces immutability, pure functions, and explicit side-effect handling via `IO` and monads — perfect for modeling an agent that:
- Receives user input (side effect)
- Makes HTTP requests to Ollama (side effect)
- Processes responses (pure logic)
- Maintains state (e.g., conversation history) via pure state transformers (`StateT`, `ReaderT`)

This aligns with the “Agnostic Agent” ideal: logic is decoupled from I/O, making it testable, composable, and reusable.

#### 2. **Strong Type System + Algebraic Data Types (ADTs)**
You can model the Ollama API interactions with precise types:

```haskell
data Model = Model { name :: String, description :: Maybe String }

data Request = Request
  { model    :: Model
  , prompt   :: String
  , stream   :: Bool
  , options  :: Maybe Options
  }

data Response = Response
  { completion :: String
  , done       :: Bool
  , context    :: [Int]
  }

data Error = ApiError String | JsonParseError String | TimeoutError
```

This makes invalid states unrepresentable — a core FP strength.

#### 3. **Excellent HTTP & JSON Libraries**
- [`http-client`](https://hackage.haskell.org/package/http-client) — robust, streaming, async-capable
- [`aeson`](https://hackage.haskell.org/package/aeson) — fast, composable JSON encoding/decoding
- [`swagger`](https://hackage.haskell.org/package/swagger) (optional) — if you want to generate clients from OpenAPI specs

You can write clean, type-safe wrappers around Ollama’s `/api/generate`, `/api/tags`, etc.

#### 4. **CLI Tooling: `optparse-applicative`**
A mature, composable, FP-style CLI parser:

```haskell
main :: IO ()
main = do
  opts <- execParser optsParser
  result <- runAgent opts
  either (putStrLn . show) putStrLn result
```

No mutable state. All logic flows through pure functions.

#### 5. **Agnostic by Design**
- The core agent logic (`runAgent :: AgentConfig -> IO AgentResult`) is pure and testable.
- I/O is isolated at the edges (CLI input, HTTP calls).
- You could swap Ollama for Llama.cpp, vLLM, or OpenAI with a different `Backend` module — no logic changes.

#### 6. **Concurrency & Streaming**
Ollama supports streaming responses (`stream: true`). Haskell’s lazy evaluation and `conduit`/`pipes` libraries make streaming response processing elegant and memory-safe.

#### 7. **No Runtime Overhead / Native Binaries**
`stack build --executable` produces a single static binary — perfect for CLI tools.

---

### 🆚 Alternatives (and why they’re secondary)

| Language | Pros | Cons |
|---------|------|------|
| **Rust** | Fast, safe, great CLI tools (`clap`) | Not purely functional; FP possible but not enforced |
| **Elm** | Pure FP, great for UI | No CLI/HTTP support; not suited for backend/CLI |
| **PureScript** | Haskell-like, compiles to JS | Weak CLI tooling; Ollama API needs HTTP, not browser |
| **F#** | Great FP + .NET | Less mature for CLI/HTTP; ecosystem smaller than Haskell |
| **Scala** | FP + JVM | Heavy, slow startup, overkill for CLI |
| **Clojure** | Functional, Lisp | Dynamic typing, less type safety; harder to reason about API contracts |

---

### 🧠 Example Flow (Haskell-style)

```haskell
main :: IO ()
main = do
  input <- getLine
  let request = buildRequest "llama3" input
  response <- sendToOllama request
  putStrLn $ formatResponse response
```

- `buildRequest` — pure function
- `sendToOllama` — `IO Response`
- `formatResponse` — pure function

All logic is composable. You can unit test `buildRequest` and `formatResponse` without touching the network.

---

### ✅ Conclusion

> **Haskell** is the ideal language to build an **Agnostic Agent CLI** for Ollama using **Functional Programming** because it:
> - Enforces purity and immutability
> - Provides strong typing for API contracts
> - Offers excellent tooling for HTTP, JSON, and CLI
> - Enables true separation of concerns
> - Produces fast, reliable, testable binaries

It turns the “Agnostic Agent” from a buzzword into a **practically elegant, mathematically sound system**.

If you want to go even further: add **ZIO**-style effect systems via `freer-simple` or `extensible-effects` — but even vanilla Haskell is more than sufficient.
```bash
Prompt: 42 tokens, 163.629 tokens-per-sec
Generation: 1125 tokens, 54.763 tokens-per-sec
Peak memory: 53.458 GB
```


This model [Qwen3-Next-80B-A3B-Instruct-qx64-mlx](https://huggingface.co/Qwen3-Next-80B-A3B-Instruct-qx64-mlx) was
converted to MLX format from [Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)
using mlx-lm version **0.27.1**.

## Use with mlx

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Next-80B-A3B-Instruct-qx64-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
```