RAGEN Checkpoint - 300 (BF16)

这是基于 Qwen/Qwen2.5-1.5B-Instruct 训练的RAGEN模型的Actor checkpoint。

模型信息

  • 训练步数: 300
  • 精度: BF16
  • 框架: PyTorch + Transformers
  • 基础模型: Qwen/Qwen2.5-1.5B-Instruct
  • 任务: 文本生成 (RLHF训练后的Actor模型)

使用方法

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "BlankZ/ragen-checkpoint-step-300-bf16"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 生成文本
# 注意:请根据您的训练任务调整prompt格式
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "你好"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**model_inputs, max_length=100)
print(tokenizer.decode(outputs[0]))

注意事项

  • 这是RLHF训练后的Actor模型,用于文本生成。
  • 使用BF16精度以节省显存,建议使用支持BF16的GPU (如A100, H100等)。
Downloads last month
1
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BlankZ/ragen-checkpoint-step-300-bf16

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1418)
this model