nightmedia
/

Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx

Model card Files Files and versions

Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx / README.md

nightmedia's picture

Update README.md

f63e424 verified about 2 months ago

|

1.9 kB

	---
	license: apache-2.0
	library_name: mlx
	datasets:
	- DavidAU/ST-TheNextGeneration
	language:
	- en
	- fr
	- zh
	- de
	tags:
	- programming
	- code generation
	- code
	- codeqwen
	- moe
	- coding
	- coder
	- qwen2
	- chat
	- qwen
	- qwen-coder
	- Qwen3-Coder-30B-A3B-Instruct
	- Qwen3-30B-A3B
	- mixture of experts
	- 128 experts
	- 8 active experts
	- 1 million context
	- qwen3
	- finetune
	- brainstorm 20x
	- brainstorm
	- optional thinking
	- qwen3_moe
	- unsloth
	- mlx
	base_model: DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG
	pipeline_tag: text-generation
	---

	# Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx

	This is a new-old-stock version of the model, with embeddings at 6 bit.

	The original [Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx) is using 4 bit embeddings

	```bash
	Perplexity: 4.455 ± 0.031
	Peak memory: 32.84 GB
	```

	Metrics coming soon. If this proves better than the qx64-hi, it will replace it in the catalog.

	-G
	This model [Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx](https://huggingface.co/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx) was
	converted to MLX format from [DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG](https://huggingface.co/DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG)
	using mlx-lm version 0.28.3.

	## Use with mlx

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-qx64x-hi-mlx")

	prompt = "hello"

	if tokenizer.chat_template is not None:
	messages = [{"role": "user", "content": prompt}]
	prompt = tokenizer.apply_chat_template(
	messages, add_generation_prompt=True
	)

	response = generate(model, tokenizer, prompt=prompt, verbose=True)
	```