Safetensors
gemma
Moyu-hrsun commited on
Commit
b8ac22d
·
verified ·
1 Parent(s): 79891ae

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - openai/summarize_from_feedback
5
+ base_model:
6
+ - google/gemma-7b
7
+ ---
8
+ # Model Card for MA-RLHF
9
+ <a href="https://iclr.cc/Conferences/2024" target="_blank">
10
+ <img alt="ICLR 2025" src="https://img.shields.io/badge/Proceedings-ICLR2025-red" />
11
+ </a>
12
+ <a href="https://github.com/ernie-research/MA-RLHF" target="_blank">
13
+ <img alt="Github" src="https://img.shields.io/badge/Github-MA_RLHF-green" />
14
+ </a>
15
+
16
+ This repository contains the official checkpoint for [Reinforcement Learning From Human Feedback with Macro Actions (MA-RLHF)](https://arxiv.org/pdf/2410.02743).
17
+
18
+ ## Model Description
19
+
20
+ MA-RLHF is a novel framework that integrates macro actions into conventional RLHF. The macro actions are sequences of tokens or higher-level language constructs, with can be computed through different defined termination conditions, like n-gram based, perplexity-based, or parsing-based termination conditions. By introducing macro actions into RLHF, we reduce the number of decision points and shorten decision trajectories, alleviating the credit assignment problem caused by long temporal distances.
21
+
22
+
23
+ |Model|Checkpoint|Base Model|Dataset|
24
+ |-----|----------|-|-|
25
+ |TLDR-Gemma-2B-MA-PPO-Fixed5|🤗 [HF Link](https://huggingface.co/baidu/TLDR-Gemma-2B-MA-PPO-Fixed5)|[google/gemma-2b](https://huggingface.co/google/gemma-2b)|[openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
26
+ |TLDR-Gemma-7B-MA-PPO-Fixed5|🤗 [HF Link](https://huggingface.co/baidu/TLDR-Gemma-7B-MA-PPO-Fixed5)|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|[openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
27
+ |TLDR-Gemma-2-27B-MA-PPO-Fixed5|🤗 [HF Link](https://huggingface.co/baidu/TLDR-Gemma-2-27B-MA-PPO-Fixed5)|[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)|[openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
28
+ |HH-RLHF-Gemma-2B-MA-PPO-Fixed5|🤗 [HF Link](https://huggingface.co/baidu/HH-RLHF-Gemma-2B-MA-PPO-Fixed5) |[google/gemma-2b](https://huggingface.co/google/gemma-2b)|[Dahoas/full-hh-rlhf](https://huggingface.co/datasets/Dahoas/full-hh-rlhf)
29
+ |HH-RLHF-Gemma-7B-MA-PPO-Fixed5|🤗 [HF Link](https://huggingface.co/baidu/HH-RLHF-Gemma-7B-MA-PPO-Fixed5) |[google/gemma-7b](https://huggingface.co/google/gemma-7b)|[Dahoas/full-hh-rlhf](https://huggingface.co/datasets/Dahoas/full-hh-rlhf)
30
+ |APPS-Gemma-2B-MA-PPO-Fixed10|🤗 [HF Link](https://huggingface.co/baidu/APPS-Gemma-2B-MA-PPO-Fixed10) |[google/codegemma-2b](https://huggingface.co/google/codegemma-2b)|[codeparrot/apps](https://huggingface.co/datasets/codeparrot/apps)
31
+ |APPS-Gemma-7B-MA-PPO-Fixed10|🤗 [HF Link](https://huggingface.co/baidu/APPS-Gemma-7B-MA-PPO-Fixed10) |[google/codegemma-7b-it](https://huggingface.co/google/codegemma-7b-it)|[codeparrot/apps](https://huggingface.co/datasets/codeparrot/apps)
32
+
33
+
34
+ ## Model Usage
35
+
36
+ ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+
39
+ model_path = "baidu/TLDR-Gemma-7B-MA-PPO-Fixed5"
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
42
+
43
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype='auto', trust_remote_code=True)
44
+
45
+ input_text = """
46
+ POST Subreddit: r/cats
47
+ Hello everyone! One of my cats is about 10 years old now, she is pretty much strictly
48
+ indoors save for some time she spends on our screened in porch each day. (She likes
49
+ to watch the birds in the yard while she suns herself by the pool, quite the princess).
50
+ Anyway, when she was younger she was very active and quite small, however with
51
+ age she has put on a pretty hefty amount of weight. I feed her indoor cat food
52
+ for weight control, I’ve switched brands a few times trying to find something that
53
+ works, I’ve cut back on feeding her by a lot (she gets very angry and demanding
54
+ when she wants food but I don’t give in) however, nothing really seems to work.
55
+ I’ve tried cat toys, and bought a harness thinking I could try to walk her but she just
56
+ lays down and looks at me like I’m stupid. Basically I just want to know if you all
57
+ have any suggestions for exercise or food. I care about her and don’t want this to
58
+ get any worse. I also have another cat that eats the same amount and type of food
59
+ as her and is a completely normal weight and only a year younger, however he is a
60
+ male, not sure if that makes a difference in predisposition for weight gain. They are
61
+ also both fixed. TL;DR:
62
+ """
63
+
64
+ input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)
65
+ output_ids = model.generate(**input_ids, max_new_tokens=20)
66
+ response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
67
+
68
+ print(response)
69
+ ```
70
+
71
+ ## Citation
72
+
73
+ ```
74
+ @inproceedings{
75
+ chai2025marlhf,
76
+ title={{MA}-{RLHF}: Reinforcement Learning from Human Feedback with Macro Actions},
77
+ author={Yekun Chai and Haoran Sun and Huang Fang and Shuohuan Wang and Yu Sun and Hua Wu},
78
+ booktitle={The Thirteenth International Conference on Learning Representations},
79
+ year={2025},
80
+ url={https://openreview.net/forum?id=WWXjMYZxfH}
81
+ }
82
+ ```