ertghiu256 commited on
Commit
8eb73f7
·
verified ·
1 Parent(s): 29a4f9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -81
README.md CHANGED
@@ -1,81 +1,150 @@
1
- ---
2
- base_model:
3
- - ertghiu256/qwen3-4b-code-reasoning
4
- - Qwen/Qwen3-4B
5
- - Tesslate/UIGEN-T3-4B-Preview-MAX
6
- - ertghiu256/qwen-3-4b-mixture-of-thought
7
- - POLARIS-Project/Polaris-4B-Preview
8
- - ertghiu256/qwen3-math-reasoner
9
- - ertghiu256/qwen3-multi-reasoner
10
- - ValiantLabs/Qwen3-4B-Esper3
11
- - ValiantLabs/Qwen3-4B-ShiningValiant3
12
- - prithivMLmods/Crux-Qwen3_OpenThinking-4B
13
- library_name: transformers
14
- tags:
15
- - mergekit
16
- - merge
17
-
18
- ---
19
- # MergedFinetunes-1
20
-
21
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
22
-
23
- ## Merge Details
24
- ### Merge Method
25
-
26
- This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) as a base.
27
-
28
- ### Models Merged
29
-
30
- The following models were included in the merge:
31
- * [ertghiu256/qwen3-4b-code-reasoning](https://huggingface.co/ertghiu256/qwen3-4b-code-reasoning)
32
- * [Tesslate/UIGEN-T3-4B-Preview-MAX](https://huggingface.co/Tesslate/UIGEN-T3-4B-Preview-MAX)
33
- * [ertghiu256/qwen-3-4b-mixture-of-thought](https://huggingface.co/ertghiu256/qwen-3-4b-mixture-of-thought)
34
- * [POLARIS-Project/Polaris-4B-Preview](https://huggingface.co/POLARIS-Project/Polaris-4B-Preview)
35
- * [ertghiu256/qwen3-math-reasoner](https://huggingface.co/ertghiu256/qwen3-math-reasoner)
36
- * [ertghiu256/qwen3-multi-reasoner](https://huggingface.co/ertghiu256/qwen3-multi-reasoner)
37
- * [ValiantLabs/Qwen3-4B-Esper3](https://huggingface.co/ValiantLabs/Qwen3-4B-Esper3)
38
- * [ValiantLabs/Qwen3-4B-ShiningValiant3](https://huggingface.co/ValiantLabs/Qwen3-4B-ShiningValiant3)
39
- * [prithivMLmods/Crux-Qwen3_OpenThinking-4B](https://huggingface.co/prithivMLmods/Crux-Qwen3_OpenThinking-4B)
40
-
41
- ### Configuration
42
-
43
- The following YAML configuration was used to produce this model:
44
-
45
- ```yaml
46
- models:
47
- - model: ertghiu256/qwen3-math-reasoner
48
- parameters:
49
- weight: 0.7
50
- - model: ertghiu256/qwen3-4b-code-reasoning
51
- parameters:
52
- weight: 0.8
53
- - model: ertghiu256/qwen-3-4b-mixture-of-thought
54
- parameters:
55
- weight: 0.9
56
- - model: POLARIS-Project/Polaris-4B-Preview
57
- parameters:
58
- weight: 0.7
59
- - model: ertghiu256/qwen3-multi-reasoner
60
- parameters:
61
- weight: 0.8
62
- - model: ValiantLabs/Qwen3-4B-Esper3
63
- parameters:
64
- weight: 0.8
65
- - model: Tesslate/UIGEN-T3-4B-Preview-MAX
66
- parameters:
67
- weight: 0.8
68
- - model: ValiantLabs/Qwen3-4B-ShiningValiant3
69
- parameters:
70
- weight: 0.8
71
- - model: prithivMLmods/Crux-Qwen3_OpenThinking-4B
72
- parameters:
73
- weight: 0.9
74
- merge_method: ties
75
- base_model: Qwen/Qwen3-4B
76
- parameters:
77
- normalize: true
78
- int8_mask: true
79
- dtype: float16
80
-
81
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - ertghiu256/qwen3-4b-code-reasoning
4
+ - Qwen/Qwen3-4B
5
+ - Tesslate/UIGEN-T3-4B-Preview-MAX
6
+ - ertghiu256/qwen-3-4b-mixture-of-thought
7
+ - POLARIS-Project/Polaris-4B-Preview
8
+ - ertghiu256/qwen3-math-reasoner
9
+ - ertghiu256/qwen3-multi-reasoner
10
+ - ValiantLabs/Qwen3-4B-Esper3
11
+ - ValiantLabs/Qwen3-4B-ShiningValiant3
12
+ - prithivMLmods/Crux-Qwen3_OpenThinking-4B
13
+ library_name: transformers
14
+ tags:
15
+ - mergekit
16
+ - merge
17
+
18
+ ---
19
+ # Ties merged COde MAth aNd Reasoning model
20
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
21
+
22
+ ## Merge Details
23
+ This model aims to combine the code and math capabilities by merging multiple Qwen 3 finetunes.
24
+
25
+ # How to run
26
+ You can run this model by using multiple interface choices
27
+
28
+ ## transformers
29
+ As the qwen team suggested to use
30
+ ```python
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+
33
+ model_name = "ertghiu256/Qwen3-4b-tcomanr-merge"
34
+
35
+ # load the tokenizer and the model
36
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
37
+ model = AutoModelForCausalLM.from_pretrained(
38
+ model_name,
39
+ torch_dtype="auto",
40
+ device_map="auto"
41
+ )
42
+
43
+ # prepare the model input
44
+ prompt = "Give me a short introduction to large language model."
45
+ messages = [
46
+ {"role": "user", "content": prompt}
47
+ ]
48
+ text = tokenizer.apply_chat_template(
49
+ messages,
50
+ tokenize=False,
51
+ add_generation_prompt=True,
52
+ enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
53
+ )
54
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
55
+
56
+ # conduct text completion
57
+ generated_ids = model.generate(
58
+ **model_inputs,
59
+ max_new_tokens=32768
60
+ )
61
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
62
+
63
+ # parsing thinking content
64
+ try:
65
+ # rindex finding 151668 (</think>)
66
+ index = len(output_ids) - output_ids[::-1].index(151668)
67
+ except ValueError:
68
+ index = 0
69
+
70
+ thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
71
+ content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
72
+
73
+ print("thinking content:", thinking_content)
74
+ print("content:", content)
75
+ ```
76
+
77
+ ## vllm
78
+ Run this command
79
+ `vllm serve ertghiu256/Qwen3-4b-tcomanr-merge --enable-reasoning --reasoning-parser deepseek_r1`
80
+
81
+ ## Sglang
82
+ Run this command
83
+ `python -m sglang.launch_server --model-path ertghiu256/Qwen3-4b-tcomanr-merge --reasoning-parser deepseek-r1`
84
+
85
+ ## llama.cpp
86
+ Run this command
87
+ `.\llama-server -m hf.co/ertghiu256/Qwen3-4b-tcomanr-merge`
88
+
89
+ ## ollama
90
+ Run this command
91
+ `ollama run hf.co/ertghiu256/Qwen3-4b-tcomanr-merge`
92
+
93
+ ## lm studio
94
+ Search `ertghiu256/Qwen3-4b-tcomanr-merge` in the lm studio model search list then download
95
+
96
+ ### Merge Details
97
+ This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) as a base.
98
+
99
+ #### Models:
100
+ The following models were included in the merge:
101
+ * [ertghiu256/qwen3-4b-code-reasoning](https://huggingface.co/ertghiu256/qwen3-4b-code-reasoning)
102
+ * [Tesslate/UIGEN-T3-4B-Preview-MAX](https://huggingface.co/Tesslate/UIGEN-T3-4B-Preview-MAX)
103
+ * [ertghiu256/qwen-3-4b-mixture-of-thought](https://huggingface.co/ertghiu256/qwen-3-4b-mixture-of-thought)
104
+ * [POLARIS-Project/Polaris-4B-Preview](https://huggingface.co/POLARIS-Project/Polaris-4B-Preview)
105
+ * [ertghiu256/qwen3-math-reasoner](https://huggingface.co/ertghiu256/qwen3-math-reasoner)
106
+ * [ertghiu256/qwen3-multi-reasoner](https://huggingface.co/ertghiu256/qwen3-multi-reasoner)
107
+ * [ValiantLabs/Qwen3-4B-Esper3](https://huggingface.co/ValiantLabs/Qwen3-4B-Esper3)
108
+ * [ValiantLabs/Qwen3-4B-ShiningValiant3](https://huggingface.co/ValiantLabs/Qwen3-4B-ShiningValiant3)
109
+ * [prithivMLmods/Crux-Qwen3_OpenThinking-4B](https://huggingface.co/prithivMLmods/Crux-Qwen3_OpenThinking-4B)
110
+
111
+ #### Configuration
112
+ The following YAML configuration was used to produce this model:
113
+
114
+ ```yaml
115
+ models:
116
+ - model: ertghiu256/qwen3-math-reasoner
117
+ parameters:
118
+ weight: 0.7
119
+ - model: ertghiu256/qwen3-4b-code-reasoning
120
+ parameters:
121
+ weight: 0.8
122
+ - model: ertghiu256/qwen-3-4b-mixture-of-thought
123
+ parameters:
124
+ weight: 0.9
125
+ - model: POLARIS-Project/Polaris-4B-Preview
126
+ parameters:
127
+ weight: 0.7
128
+ - model: ertghiu256/qwen3-multi-reasoner
129
+ parameters:
130
+ weight: 0.8
131
+ - model: ValiantLabs/Qwen3-4B-Esper3
132
+ parameters:
133
+ weight: 0.8
134
+ - model: Tesslate/UIGEN-T3-4B-Preview-MAX
135
+ parameters:
136
+ weight: 0.8
137
+ - model: ValiantLabs/Qwen3-4B-ShiningValiant3
138
+ parameters:
139
+ weight: 0.8
140
+ - model: prithivMLmods/Crux-Qwen3_OpenThinking-4B
141
+ parameters:
142
+ weight: 0.9
143
+ merge_method: ties
144
+ base_model: Qwen/Qwen3-4B
145
+ parameters:
146
+ normalize: true
147
+ int8_mask: true
148
+ dtype: float16
149
+
150
+ ```