INC4AI commited on
Commit
e738fb2
·
verified ·
1 Parent(s): a4f27ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -6,12 +6,65 @@ pipeline_tag: text-generation
6
 
7
  ## Model Details
8
 
9
- This model is a mixed int4 model with group_size 128 and symmetric quantization of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  ## Generate the Model
12
 
13
  ```bash
14
- auto-round --model_name MiniMaxAI/MiniMax-M2.5 --scheme w4a16_mixed --iters 0 --output_dir MiniMax-M2.5-int4-mixed-AutoRound
15
  ```
16
 
17
  ## Ethical Considerations and Limitations
@@ -42,4 +95,4 @@ The license on this model does not constitute legal advice. We are not responsib
42
  }
43
  ```
44
 
45
- [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
 
6
 
7
  ## Model Details
8
 
9
+ This model is an int4 model with group_size 128 and symmetric quantization of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) generated by [intel/auto-round](https://github.com/intel/auto-round). Please follow the license of the original model.
10
+
11
+ ## How to Use
12
+
13
+ ### Environment
14
+
15
+ ```bash
16
+ uv pip install transformers==4.57.1 torch accelerate --torch-backend=auto
17
+ uv pip install vllm --torch-backend=auto
18
+ ```
19
+
20
+ ### HF Usage
21
+
22
+ ```python
23
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
24
+ import torch
25
+
26
+ MODEL_PATH = "INC4AI/MiniMax-M2.5-int4-mixed-AutoRound"
27
+
28
+ model = AutoModelForCausalLM.from_pretrained(
29
+ MODEL_PATH,
30
+ device_map="auto",
31
+ trust_remote_code=True,
32
+ )
33
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
34
+
35
+ messages = [
36
+ {"role": "user", "content": [{"type": "text", "text": "What is your favourite condiment?"}]},
37
+ {"role": "assistant", "content": [{"type": "text", "text": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}]},
38
+ {"role": "user", "content": [{"type": "text", "text": "Do you have mayonnaise recipes?"}]}
39
+ ]
40
+
41
+ model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
42
+
43
+ generated_ids = model.generate(model_inputs, max_new_tokens=100, generation_config=model.generation_config)
44
+
45
+ response = tokenizer.batch_decode(generated_ids)[0]
46
+
47
+ print(response)
48
+ ```
49
+
50
+ ### VLLM Usage
51
+
52
+ ```bash
53
+ vllm serve INC4AI/MiniMax-M2.5-int4-mixed-AutoRound \
54
+ --port 7777 \
55
+ --host localhost \
56
+ --trust-remote-code \
57
+ --dtype bfloat16 \
58
+ --tensor_parallel_size 4 \
59
+ --enable-auto-tool-choice \
60
+ --tool-call-parser minimax_m2 \
61
+ --reasoning-parser minimax_m2_append_think
62
+ ```
63
 
64
  ## Generate the Model
65
 
66
  ```bash
67
+ auto-round --model_name MiniMaxAI/MiniMax-M2.5 --scheme w4a16 --ignore_layers gate --iters 0 --output_dir MiniMax-M2.5-int4-mixed-AutoRound
68
  ```
69
 
70
  ## Ethical Considerations and Limitations
 
95
  }
96
  ```
97
 
98
+ [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)