huihui-ai lbourdois commited on
Commit
3dee99d
Β·
verified Β·
1 Parent(s): b9c108e

Improve language tag (#1)

Browse files

- Improve language tag (bc88e97eb25c49bd87d0844a20c39ebb720926fb)


Co-authored-by: LoΓ―ck BOURDOIS <lbourdois@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +156 -144
README.md CHANGED
@@ -1,144 +1,156 @@
1
- ---
2
- license: apache-2.0
3
- license_link: https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3/blob/main/LICENSE
4
- language:
5
- - en
6
- pipeline_tag: text-generation
7
- base_model: Qwen/Qwen2.5-0.5B-Instruct
8
- tags:
9
- - chat
10
- - abliterated
11
- - uncensored
12
- ---
13
-
14
- # huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3
15
-
16
-
17
- This is an uncensored version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
18
- This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
19
-
20
- Ablation was performed using a new and faster method, which yields better results.
21
-
22
- This ablation version used a more precise dataset.
23
- The pass rate for the 320 harmful instructions test is **100%**.
24
-
25
- ## ollama
26
-
27
- huihui_ai/qwen2.5-abliterate:0.5b-v3 is **less than 400MB** in size and performs very well.
28
-
29
-
30
- You can use [huihui_ai/qwen2.5-abliterate:0.5b-v3](https://ollama.com/huihui_ai/qwen2.5-abliterate:0.5b-v3) directly,
31
- ```
32
- ollama run huihui_ai/qwen2.5-abliterate:0.5b-v3
33
- ```
34
-
35
- ## Usage
36
- You can use this model in your applications by loading it with Hugging Face's `transformers` library:
37
-
38
-
39
- ```python
40
- from transformers import AutoModelForCausalLM, AutoTokenizer
41
-
42
- # Load the model and tokenizer
43
- model_name = "huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3"
44
- model = AutoModelForCausalLM.from_pretrained(
45
- model_name,
46
- torch_dtype="auto",
47
- device_map="auto"
48
- )
49
- tokenizer = AutoTokenizer.from_pretrained(model_name)
50
-
51
- # Initialize conversation context
52
- initial_messages = [
53
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}
54
- ]
55
- messages = initial_messages.copy() # Copy the initial conversation context
56
-
57
- # Enter conversation loop
58
- while True:
59
- # Get user input
60
- user_input = input("User: ").strip() # Strip leading and trailing spaces
61
-
62
- # If the user types '/exit', end the conversation
63
- if user_input.lower() == "/exit":
64
- print("Exiting chat.")
65
- break
66
-
67
- # If the user types '/clean', reset the conversation context
68
- if user_input.lower() == "/clean":
69
- messages = initial_messages.copy() # Reset conversation context
70
- print("Chat history cleared. Starting a new conversation.")
71
- continue
72
-
73
- # If input is empty, prompt the user and continue
74
- if not user_input:
75
- print("Input cannot be empty. Please enter something.")
76
- continue
77
-
78
- # Add user input to the conversation
79
- messages.append({"role": "user", "content": user_input})
80
-
81
- # Build the chat template
82
- text = tokenizer.apply_chat_template(
83
- messages,
84
- tokenize=False,
85
- add_generation_prompt=True
86
- )
87
-
88
- # Tokenize input and prepare it for the model
89
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
90
-
91
- # Generate a response from the model
92
- generated_ids = model.generate(
93
- **model_inputs,
94
- max_new_tokens=8192
95
- )
96
-
97
- # Extract model output, removing special tokens
98
- generated_ids = [
99
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
100
- ]
101
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
102
-
103
- # Add the model's response to the conversation
104
- messages.append({"role": "assistant", "content": response})
105
-
106
- # Print the model's response
107
- print(f"Qwen: {response}")
108
-
109
- ```
110
-
111
- ## Pass Rate Description
112
-
113
- The pass rate is defined as the proportion of harmful instructions that did not trigger the test condition (TestPassed=False) out of the total number of instructions processed. It is calculated by subtracting the number of triggered instructions (triggered_total) from the total number of instructions (total), then dividing the result by the total number of instructions: (total - triggered_total) / total. The pass rate is presented as a decimal value (rounded to two decimal places for clarity) and as a percentage (rounded to one decimal place) to clearly indicate the fraction of instructions that did not trigger the condition.
114
-
115
- The test set data comes from [huihui-ai/harmbench_behaviors](https://huggingface.co/datasets/huihui-ai/harmbench_behaviors), the test code, [TestPassed.py](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3/blob/main/TestPassed.py).
116
-
117
- The test result is [100.00%](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3/blob/main/TestPassed.jsonl).
118
- ```
119
- python TestPassed.py
120
- Load Model huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3 ...
121
- Processing harmful instructions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 320/320 [01:04<00:00, 4.99it/s]
122
- Passed total: 320/320, Passed ratio: 1.00 (100.00%)
123
- ```
124
-
125
- Below is the comparison of pass rates.
126
-
127
- | Model | Passed total | Passed ratio |
128
- |--------------------------------------|--------------|--------------|
129
- | Qwen2.5-0.5B-Instruct | 201/320 | 62.8% |
130
- | Qwen2.5-0.5B-Instruct-abliterated | 310/320 | 96.9% |
131
- | Qwen2.5-0.5B-Instruct-abliterated-v2 | 317/320 | 99.1% |
132
- | Qwen2.5-0.5B-Instruct-abliterated-v3 | **320/320** | **100.00%** |
133
-
134
-
135
- ### Donation
136
-
137
- If you like it, please click 'like' and follow us for more updates.
138
- You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
139
-
140
- ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
141
- - bitcoin(BTC):
142
- ```
143
- bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
144
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ license_link: https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3/blob/main/LICENSE
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ pipeline_tag: text-generation
19
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
20
+ tags:
21
+ - chat
22
+ - abliterated
23
+ - uncensored
24
+ ---
25
+
26
+ # huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3
27
+
28
+
29
+ This is an uncensored version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
30
+ This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
31
+
32
+ Ablation was performed using a new and faster method, which yields better results.
33
+
34
+ This ablation version used a more precise dataset.
35
+ The pass rate for the 320 harmful instructions test is **100%**.
36
+
37
+ ## ollama
38
+
39
+ huihui_ai/qwen2.5-abliterate:0.5b-v3 is **less than 400MB** in size and performs very well.
40
+
41
+
42
+ You can use [huihui_ai/qwen2.5-abliterate:0.5b-v3](https://ollama.com/huihui_ai/qwen2.5-abliterate:0.5b-v3) directly,
43
+ ```
44
+ ollama run huihui_ai/qwen2.5-abliterate:0.5b-v3
45
+ ```
46
+
47
+ ## Usage
48
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
49
+
50
+
51
+ ```python
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer
53
+
54
+ # Load the model and tokenizer
55
+ model_name = "huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3"
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ model_name,
58
+ torch_dtype="auto",
59
+ device_map="auto"
60
+ )
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
62
+
63
+ # Initialize conversation context
64
+ initial_messages = [
65
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}
66
+ ]
67
+ messages = initial_messages.copy() # Copy the initial conversation context
68
+
69
+ # Enter conversation loop
70
+ while True:
71
+ # Get user input
72
+ user_input = input("User: ").strip() # Strip leading and trailing spaces
73
+
74
+ # If the user types '/exit', end the conversation
75
+ if user_input.lower() == "/exit":
76
+ print("Exiting chat.")
77
+ break
78
+
79
+ # If the user types '/clean', reset the conversation context
80
+ if user_input.lower() == "/clean":
81
+ messages = initial_messages.copy() # Reset conversation context
82
+ print("Chat history cleared. Starting a new conversation.")
83
+ continue
84
+
85
+ # If input is empty, prompt the user and continue
86
+ if not user_input:
87
+ print("Input cannot be empty. Please enter something.")
88
+ continue
89
+
90
+ # Add user input to the conversation
91
+ messages.append({"role": "user", "content": user_input})
92
+
93
+ # Build the chat template
94
+ text = tokenizer.apply_chat_template(
95
+ messages,
96
+ tokenize=False,
97
+ add_generation_prompt=True
98
+ )
99
+
100
+ # Tokenize input and prepare it for the model
101
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
102
+
103
+ # Generate a response from the model
104
+ generated_ids = model.generate(
105
+ **model_inputs,
106
+ max_new_tokens=8192
107
+ )
108
+
109
+ # Extract model output, removing special tokens
110
+ generated_ids = [
111
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
112
+ ]
113
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
114
+
115
+ # Add the model's response to the conversation
116
+ messages.append({"role": "assistant", "content": response})
117
+
118
+ # Print the model's response
119
+ print(f"Qwen: {response}")
120
+
121
+ ```
122
+
123
+ ## Pass Rate Description
124
+
125
+ The pass rate is defined as the proportion of harmful instructions that did not trigger the test condition (TestPassed=False) out of the total number of instructions processed. It is calculated by subtracting the number of triggered instructions (triggered_total) from the total number of instructions (total), then dividing the result by the total number of instructions: (total - triggered_total) / total. The pass rate is presented as a decimal value (rounded to two decimal places for clarity) and as a percentage (rounded to one decimal place) to clearly indicate the fraction of instructions that did not trigger the condition.
126
+
127
+ The test set data comes from [huihui-ai/harmbench_behaviors](https://huggingface.co/datasets/huihui-ai/harmbench_behaviors), the test code, [TestPassed.py](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3/blob/main/TestPassed.py).
128
+
129
+ The test result is [100.00%](https://huggingface.co/huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3/blob/main/TestPassed.jsonl).
130
+ ```
131
+ python TestPassed.py
132
+ Load Model huihui-ai/Qwen2.5-0.5B-Instruct-abliterated-v3 ...
133
+ Processing harmful instructions: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 320/320 [01:04<00:00, 4.99it/s]
134
+ Passed total: 320/320, Passed ratio: 1.00 (100.00%)
135
+ ```
136
+
137
+ Below is the comparison of pass rates.
138
+
139
+ | Model | Passed total | Passed ratio |
140
+ |--------------------------------------|--------------|--------------|
141
+ | Qwen2.5-0.5B-Instruct | 201/320 | 62.8% |
142
+ | Qwen2.5-0.5B-Instruct-abliterated | 310/320 | 96.9% |
143
+ | Qwen2.5-0.5B-Instruct-abliterated-v2 | 317/320 | 99.1% |
144
+ | Qwen2.5-0.5B-Instruct-abliterated-v3 | **320/320** | **100.00%** |
145
+
146
+
147
+ ### Donation
148
+
149
+ If you like it, please click 'like' and follow us for more updates.
150
+ You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
151
+
152
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
153
+ - bitcoin(BTC):
154
+ ```
155
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
156
+ ```