Fizzarolli commited on
Commit
bd0dafa
·
verified ·
1 Parent(s): bd30f39

Create README-cn.md

Browse files
Files changed (1) hide show
  1. README-cn.md +124 -0
README-cn.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen3-8B-Base
5
+ tags:
6
+ - roleplay
7
+ - conversational
8
+ - axolotl
9
+ - qwen
10
+ ---
11
+
12
+ # 残响 Qwen3 8B(第一系列)
13
+
14
+ *空气中飘浮着一缕尘埃。它仿佛来自某个逝去的时代,但你无从追溯。它落在你的舌尖,滋味奇妙。*
15
+
16
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/_ovgodU331FO4YAqFGCnk.png)
17
+
18
+ 「残响」是一系列专注于SFW与NSFW角色扮演及对话的微调大语言模型。
19
+
20
+ ## 量化版本
21
+ GGUF:
22
+ - 待补充!
23
+
24
+ EXL3:
25
+ - 待补充!
26
+
27
+ EXL2:
28
+ - 待补充!
29
+
30
+ 其他格式:
31
+ - 待补充!
32
+
33
+ ## 推荐参数
34
+ 对话模板: ChatML
35
+ 采样器设置:
36
+ - 温度值 `0.8`
37
+ - 最小概率阈值 `0.1`
38
+ - 存在惩罚 `0.5`
39
+
40
+ ## 致谢
41
+ 特别感谢Allura和ilya <3
42
+ 衷心感谢以下项目的开发者:
43
+ - Axolotl(训练框架)
44
+ - 通义千问/Qwen/阿里巴巴(基础模型)
45
+ - Prime Intellect(算力支持)
46
+ - 以及我的银行(资金支持)
47
+
48
+ ## 其他信息
49
+
50
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
51
+ <details><summary>查看Axolotl配置</summary>
52
+
53
+ axolotl版本: `0.10.0.dev0`
54
+ ```yaml
55
+ # === 模型配置 ===
56
+ base_model: Qwen/Qwen3-8B-Base
57
+ load_in_8bit: false
58
+ load_in_4bit: false
59
+
60
+ # === 训练设置 ===
61
+ num_epochs: 2
62
+ micro_batch_size: 32
63
+ gradient_accumulation_steps: 1
64
+ sequence_len: 8192
65
+ sample_packing: true
66
+ pad_to_sequence_len: true
67
+
68
+ # === 超参数配置 ===
69
+ optimizer: apollo_adamw_layerwise
70
+ # Apollo-mini配置:
71
+ optim_args: "proj=random,rank=1,scale=128.0,scale_type=tensor,update_proj_gap=200"
72
+ # 标准Apollo配置:
73
+ # optim_args:
74
+ optim_target_modules: all_linear
75
+ learning_rate: 2e-5
76
+ lr_scheduler: rex
77
+ weight_decay: 0.01
78
+ warmup_ratio: 0
79
+
80
+ # === 数据配置 ===
81
+ datasets:
82
+ - path: allura-org/inkmix-v3.0
83
+ type: chat_template
84
+ split: train
85
+ field_messages: conversations
86
+ message_field_role: from
87
+ message_field_content: value
88
+
89
+ dataset_prepared_path: last_run_prepared
90
+ chat_template: chatml
91
+
92
+ # === 插件 ===
93
+ plugins:
94
+ - axolotl.integrations.liger.LigerPlugin
95
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
96
+
97
+ # === 硬件优化 ===
98
+ gradient_checkpointing: unsloth
99
+ gradient_checkpointing_kwargs:
100
+ use_reentrant: false
101
+ liger_rope: true
102
+ liger_rms_norm: true
103
+ liger_glu_activation: true
104
+ cut_cross_entropy: true
105
+
106
+ # === Wandb追踪 ===
107
+ wandb_project: qwen3-8b-inkmix-v3
108
+
109
+ # === 检查点 ===
110
+ saves_per_epoch: 2
111
+ save_total_limit: 3
112
+
113
+ # === 高级设置 ===
114
+ output_dir: /ephemeral/ckpts
115
+ bf16: auto
116
+ flash_attention: true
117
+ train_on_inputs: false
118
+ group_by_length: false
119
+ logging_steps: 1
120
+ trust_remote_code: true
121
+
122
+ ```
123
+
124
+ </details>