Files changed (1) hide show
  1. README.md +103 -90
README.md CHANGED
@@ -1,91 +1,104 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-3B-Instruct
4
- library_name: transformers
5
- tags:
6
- - mergekit
7
- - merge
8
-
9
- ---
10
- # merge
11
-
12
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
13
-
14
- ## Merge Details
15
-
16
- The model was merged using the **passthrough merge method**, with a **sliding window approach**.
17
-
18
- ### Duplicated Layers
19
-
20
- Due to the sliding window method, layers overlap between slices. The following layers are duplicated across multiple slices:
21
-
22
- - Layers 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32
23
-
24
- This means each layer is used in multiple slices, with each slice overlapping with the previous and next slice.
25
-
26
- ### Models Merged
27
-
28
- The following models were included in the merge:
29
- * [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
30
-
31
- ### Configuration
32
-
33
- The following YAML configuration was used to produce this model:
34
-
35
- ```yaml
36
- slices:
37
- - sources:
38
- - model: Qwen/Qwen2.5-3B-Instruct
39
- layer_range: [0, 4]
40
- - sources:
41
- - model: Qwen/Qwen2.5-3B-Instruct
42
- layer_range: [2, 6]
43
- - sources:
44
- - model: Qwen/Qwen2.5-3B-Instruct
45
- layer_range: [4, 8]
46
- - sources:
47
- - model: Qwen/Qwen2.5-3B-Instruct
48
- layer_range: [6, 10]
49
- - sources:
50
- - model: Qwen/Qwen2.5-3B-Instruct
51
- layer_range: [8, 12]
52
- - sources:
53
- - model: Qwen/Qwen2.5-3B-Instruct
54
- layer_range: [10, 14]
55
- - sources:
56
- - model: Qwen/Qwen2.5-3B-Instruct
57
- layer_range: [12, 16]
58
- - sources:
59
- - model: Qwen/Qwen2.5-3B-Instruct
60
- layer_range: [14, 18]
61
- - sources:
62
- - model: Qwen/Qwen2.5-3B-Instruct
63
- layer_range: [16, 20]
64
- - sources:
65
- - model: Qwen/Qwen2.5-3B-Instruct
66
- layer_range: [18, 22]
67
- - sources:
68
- - model: Qwen/Qwen2.5-3B-Instruct
69
- layer_range: [20, 24]
70
- - sources:
71
- - model: Qwen/Qwen2.5-3B-Instruct
72
- layer_range: [22, 26]
73
- - sources:
74
- - model: Qwen/Qwen2.5-3B-Instruct
75
- layer_range: [24, 28]
76
- - sources:
77
- - model: Qwen/Qwen2.5-3B-Instruct
78
- layer_range: [26, 30]
79
- - sources:
80
- - model: Qwen/Qwen2.5-3B-Instruct
81
- layer_range: [28, 32]
82
- - sources:
83
- - model: Qwen/Qwen2.5-3B-Instruct
84
- layer_range: [30, 34]
85
- - sources:
86
- - model: Qwen/Qwen2.5-3B-Instruct
87
- layer_range: [32, 36]
88
- tokenizer_source: Qwen/Qwen2.5-3B-Instruct
89
- merge_method: passthrough
90
- dtype: bfloat16
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ```
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-3B-Instruct
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+ language:
9
+ - zho
10
+ - eng
11
+ - fra
12
+ - spa
13
+ - por
14
+ - deu
15
+ - ita
16
+ - rus
17
+ - jpn
18
+ - kor
19
+ - vie
20
+ - tha
21
+ - ara
22
+ ---
23
+ # merge
24
+
25
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
26
+
27
+ ## Merge Details
28
+
29
+ The model was merged using the **passthrough merge method**, with a **sliding window approach**.
30
+
31
+ ### Duplicated Layers
32
+
33
+ Due to the sliding window method, layers overlap between slices. The following layers are duplicated across multiple slices:
34
+
35
+ - Layers 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32
36
+
37
+ This means each layer is used in multiple slices, with each slice overlapping with the previous and next slice.
38
+
39
+ ### Models Merged
40
+
41
+ The following models were included in the merge:
42
+ * [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
43
+
44
+ ### Configuration
45
+
46
+ The following YAML configuration was used to produce this model:
47
+
48
+ ```yaml
49
+ slices:
50
+ - sources:
51
+ - model: Qwen/Qwen2.5-3B-Instruct
52
+ layer_range: [0, 4]
53
+ - sources:
54
+ - model: Qwen/Qwen2.5-3B-Instruct
55
+ layer_range: [2, 6]
56
+ - sources:
57
+ - model: Qwen/Qwen2.5-3B-Instruct
58
+ layer_range: [4, 8]
59
+ - sources:
60
+ - model: Qwen/Qwen2.5-3B-Instruct
61
+ layer_range: [6, 10]
62
+ - sources:
63
+ - model: Qwen/Qwen2.5-3B-Instruct
64
+ layer_range: [8, 12]
65
+ - sources:
66
+ - model: Qwen/Qwen2.5-3B-Instruct
67
+ layer_range: [10, 14]
68
+ - sources:
69
+ - model: Qwen/Qwen2.5-3B-Instruct
70
+ layer_range: [12, 16]
71
+ - sources:
72
+ - model: Qwen/Qwen2.5-3B-Instruct
73
+ layer_range: [14, 18]
74
+ - sources:
75
+ - model: Qwen/Qwen2.5-3B-Instruct
76
+ layer_range: [16, 20]
77
+ - sources:
78
+ - model: Qwen/Qwen2.5-3B-Instruct
79
+ layer_range: [18, 22]
80
+ - sources:
81
+ - model: Qwen/Qwen2.5-3B-Instruct
82
+ layer_range: [20, 24]
83
+ - sources:
84
+ - model: Qwen/Qwen2.5-3B-Instruct
85
+ layer_range: [22, 26]
86
+ - sources:
87
+ - model: Qwen/Qwen2.5-3B-Instruct
88
+ layer_range: [24, 28]
89
+ - sources:
90
+ - model: Qwen/Qwen2.5-3B-Instruct
91
+ layer_range: [26, 30]
92
+ - sources:
93
+ - model: Qwen/Qwen2.5-3B-Instruct
94
+ layer_range: [28, 32]
95
+ - sources:
96
+ - model: Qwen/Qwen2.5-3B-Instruct
97
+ layer_range: [30, 34]
98
+ - sources:
99
+ - model: Qwen/Qwen2.5-3B-Instruct
100
+ layer_range: [32, 36]
101
+ tokenizer_source: Qwen/Qwen2.5-3B-Instruct
102
+ merge_method: passthrough
103
+ dtype: bfloat16
104
  ```