Wei Xiong's picture

Wei Xiong

weqweasdas

·

https://weixiongust.github.io/WeiXiongUST/index.html

AI & ML interests

Machine learning, RLHF

Recent Activity

upvoted a paper 12 days ago

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

updated a dataset 20 days ago

weqweasdas/qwen15b_train_simple_subset5k_for_difficulty_transition

published a dataset 20 days ago

weqweasdas/qwen15b_train_simple_subset5k_for_difficulty_transition

View all activity

Organizations

upvoted a paper 12 days ago

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

Paper • 2510.27623 • Published 15 days ago • 12

upvoted 3 papers about 1 month ago

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Paper • 2510.11769 • Published Oct 13 • 25

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 262

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

Paper • 2510.04996 • Published Oct 6 • 15

upvoted a paper 2 months ago

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 21

upvoted a paper 5 months ago

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

Paper • 2506.18945 • Published Jun 23 • 40

upvoted 4 papers 6 months ago

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 141

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Paper • 2505.16400 • Published May 22 • 35

Scalable Chain of Thoughts via Elastic Reasoning

Paper • 2505.05315 • Published May 8 • 26

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5 • 25

upvoted a paper 7 months ago

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15 • 19

upvoted 2 papers 9 months ago

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published Mar 2 • 56

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 83

upvoted 2 collections about 1 year ago

RLHFlow MATH Process Reward Model

This is a collection of datasets and models of process reward modeling. • 15 items • Updated Nov 9, 2024 • 11

SFT Models

We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose. • 6 items • Updated Nov 3, 2024 • 2

upvoted a collection over 1 year ago

Online RLHF

Datasets, code, and models for online RLHF (i.e., iterative DPO) • 19 items • Updated Jun 12, 2024 • 5

upvoted 4 papers over 1 year ago

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

Paper • 2312.11456 • Published Dec 18, 2023 • 1

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Paper • 2306.12420 • Published Jun 21, 2023 • 2

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Paper • 2304.06767 • Published Apr 13, 2023 • 2

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71