Intel
/

MiniMax-M2.5-int4-AutoRound

Text Generation

4-bit precision

Model card Files Files and versions

INC4AI commited on 7 days ago

Commit

e738fb2

·

verified ·

1 Parent(s): a4f27ab

Update README.md

Files changed (1) hide show

README.md +56 -3

README.md CHANGED Viewed

@@ -6,12 +6,65 @@ pipeline_tag: text-generation
 ## Model Details
-This model is a mixed int4 model with group_size 128 and symmetric quantization of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
 ## Generate the Model
 ```bash
-auto-round --model_name MiniMaxAI/MiniMax-M2.5 --scheme w4a16_mixed --iters 0 --output_dir MiniMax-M2.5-int4-mixed-AutoRound
 ```
 ## Ethical Considerations and Limitations
@@ -42,4 +95,4 @@ The license on this model does not constitute legal advice. We are not responsib
 }
 ```
-[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)

 ## Model Details
+This model is an int4 model with group_size 128 and symmetric quantization of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
+## How to Use
+### Environment
+```bash
+uv pip install transformers==4.57.1 torch accelerate --torch-backend=auto
+uv pip install vllm --torch-backend=auto
+```
+### HF Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
+import torch
+MODEL_PATH = "INC4AI/MiniMax-M2.5-int4-mixed-AutoRound"
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_PATH,
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
+messages = [
+    {"role": "user", "content": [{"type": "text", "text": "What is your favourite condiment?"}]},
+    {"role": "assistant", "content": [{"type": "text", "text": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}]},
+    {"role": "user", "content": [{"type": "text", "text": "Do you have mayonnaise recipes?"}]}
+]
+model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
+generated_ids = model.generate(model_inputs, max_new_tokens=100, generation_config=model.generation_config)
+response = tokenizer.batch_decode(generated_ids)[0]
+print(response)
+```
+### VLLM Usage
+```bash
+vllm serve INC4AI/MiniMax-M2.5-int4-mixed-AutoRound \
+    --port 7777 \
+    --host localhost \
+    --trust-remote-code \
+    --dtype bfloat16 \
+    --tensor_parallel_size 4 \
+    --enable-auto-tool-choice \
+    --tool-call-parser minimax_m2 \
+    --reasoning-parser minimax_m2_append_think
+```
 ## Generate the Model
 ```bash
+auto-round --model_name MiniMaxAI/MiniMax-M2.5 --scheme w4a16 --ignore_layers gate --iters 0 --output_dir MiniMax-M2.5-int4-mixed-AutoRound
 ```
 ## Ethical Considerations and Limitations
 }
 ```
+[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)