π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs
Abstract
Step-wise negative-aware fine-tuning enables efficient reinforcement learning for vision-language-action models by eliminating likelihood computation and auxiliary networks while improving generalization in complex environments.
Flow-based vision-language-action (VLA) models excel in embodied control but suffer from intractable likelihoods during multi-step sampling, hindering online reinforcement learning. We propose \textit{boldsymbolπ-StepNFT} (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step and eliminates auxiliary value networks. We identify that wider exploration spaces necessitate finer-grained, step-wise guidance for alignment. Empirically, π-StepNFT unlocks latent potential on LIBERO with competitive few-shot robustness. Moreover, it achieves superior generalization on ManiSkill, outperforming value-based baselines in OOD scenarios by preventing overfitting to multimodal features. This property offers a scalable solution promising for complex real-world applications.
Community
π-StepNFT is a critic-free, likelihood-free online RL fine-tuning method for flow-based VLA policies.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation (2026)
- On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning (2026)
- Self-Correcting VLA: Online Action Refinement via Sparse World Imagination (2026)
- Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO (2026)
- SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning (2026)
- World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy (2026)
- IG-RFT: An Interaction-Guided RL Framework for VLA Models in Long-Horizon Robotic Manipulation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper