Qwen-JSON / README_CN.md

Update README_CN.md

10f1653 verified 15 days ago

2.2 kB

	# RL-Struct: 弥合结构鸿沟

	[English Version](./README.md)

	本仓库包含论文 "Bridging the Structure Gap: A Lightweight RL Framework for Reliable Structured Output Generation in LLMs" 的模型和代码。

	我们提出了 RL-Struct，这是一个轻量级的强化学习框架，旨在解决“结构鸿沟”问题——即概率性 Token 生成与确定性结构化格式（如 JSON）之间的矛盾。通过利用 GRPO（梯度正则化策略优化）和新颖的多维奖励函数，我们的模型在无需高延迟约束解码的情况下，实现了卓越的结构可靠性。

	## 🚀 核心特性

	- 多维奖励函数：将目标分解为结构（Structure）、格式（Format）、有效性（Validity）、正确性（Correctness）和长度（Length）。
	- 高效训练：使用 GRPO 消除 Critic 网络，相比 PPO 减少约 40% 的显存占用。
	- 涌现课程学习：模型自发地先学习语法（如何说），再学习语义（说什么）。
	- 高性能：在复杂的食谱生成任务上实现了 89.7% 的结构准确率和 92.1% 的 JSON 有效性，优于 LLaMA-3-8B 和 GPT-3.5。

	## 📊 模型详情

	- 基座模型： [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
	- 训练方法： GRPO (强化学习) + LoRA
	- 任务：结构化输出生成 (JSON 食谱, GSM8K-JSON, ToolUse)
	- 协议： Apache-2.0

	## 🛠️ 使用方法

	### 系统提示词 (System Prompt)
	为确保正确的 JSON 输出，请使用以下系统提示词：

	```text
	You are a precise recipe assistant. Always respond in the following JSON format:
	{
	"reasoning": "Your step-by-step reasoning here...",
	"answer": "{\"name\": \"Recipe Name\", \"nutrition\": \"Calories: ..., Protein: ..., Fat: ...\"}"
	}
	Do not include any other text, explanations, or markdown. Only output valid JSON.
	```

	## 📈 性能表现

	\| 方法 \| 结构准确率 \| JSON 有效性 \| 内容准确率 \|
	\| :--- \| :---: \| :---: \| :---: \|
	\| GPT-3.5 (Zero-shot) \| 45.5% \| 82.1% \| 88.0% \|
	\| LLaMA-3-8B (SFT) \| 78.2% \| 85.4% \| 86.0% \|
	\| RL-Struct (Ours) \| 89.7% \| 92.1% \| 84.5% \|

	# RL-Struct: 弥合结构鸿沟

	[English Version](./README.md)

	本仓库包含论文 "Bridging the Structure Gap: A Lightweight RL Framework for Reliable Structured Output Generation in LLMs" 的模型和代码。

	我们提出了 RL-Struct，这是一个轻量级的强化学习框架，旨在解决“结构鸿沟”问题——即概率性 Token 生成与确定性结构化格式（如 JSON）之间的矛盾。通过利用 GRPO（梯度正则化策略优化）和新颖的多维奖励函数，我们的模型在无需高延迟约束解码的情况下，实现了卓越的结构可靠性。

	## 🚀 核心特性

	- 多维奖励函数：将目标分解为结构（Structure）、格式（Format）、有效性（Validity）、正确性（Correctness）和长度（Length）。
	- 高效训练：使用 GRPO 消除 Critic 网络，相比 PPO 减少约 40% 的显存占用。
	- 涌现课程学习：模型自发地先学习语法（如何说），再学习语义（说什么）。
	- 高性能：在复杂的食谱生成任务上实现了 89.7% 的结构准确率和 92.1% 的 JSON 有效性，优于 LLaMA-3-8B 和 GPT-3.5。

	## 📊 模型详情

	- 基座模型： [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
	- 训练方法： GRPO (强化学习) + LoRA
	- 任务：结构化输出生成 (JSON 食谱, GSM8K-JSON, ToolUse)
	- 协议： Apache-2.0

	## 🛠️ 使用方法

	### 系统提示词 (System Prompt)
	为确保正确的 JSON 输出，请使用以下系统提示词：

	```text
	You are a precise recipe assistant. Always respond in the following JSON format:
	{
	"reasoning": "Your step-by-step reasoning here...",
	"answer": "{\"name\": \"Recipe Name\", \"nutrition\": \"Calories: ..., Protein: ..., Fat: ...\"}"
	}
	Do not include any other text, explanations, or markdown. Only output valid JSON.
	```

	## 📈 性能表现

	\| 方法 \| 结构准确率 \| JSON 有效性 \| 内容准确率 \|
	\| :--- \| :---: \| :---: \| :---: \|
	\| GPT-3.5 (Zero-shot) \| 45.5% \| 82.1% \| 88.0% \|
	\| LLaMA-3-8B (SFT) \| 78.2% \| 85.4% \| 86.0% \|
	\| RL-Struct (Ours) \| 89.7% \| 92.1% \| 84.5% \|