Chenjia Bai
Chenjia Bai
Home
Book
Publications
Team
Join us
Light
Dark
Automatic
Under-Review
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning.
In
Artificial Intelligence (under review)
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS).
Qiaosheng Zhang
,
Chenjia Bai
,
Shuyu Hu
,
Zhen Wang
✉
,
Xuelong Li
✉
PDF
Cite
Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control.
under review
We develop a learning framework combining offline diffusion planner and online preference alignment with weak preference labeling for legged locomotion control.
Xinyi Yuan
,
Zhiwei Shang
,
Zifan Wang
,
Chenkai Wang
,
Zhao Shan
,
Zhenchao Qi
,
Meixin Zhu
✉
,
Chenjia Bai
✉
,
Xuelong Li
PDF
Cite
Project
公众号
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration.
under review
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
Yang Zhang
,
Shixin Yang
,
Chenjia Bai
✉
,
Fei Wu
,
Xiu Li
,
Xuelong Li
,
Zhen Wang
PDF
Cite
Project
公众号
VLP: Vision-Language Preference Learning for Embodied Manipulation.
under review
we propose a novel Vision-Language Preference learning framework that learns a vision-language preference model to provide preference feedback for embodied manipulation tasks.
Runze Liu
,
Chenjia Bai
✉
,
Jiafei Lyu
,
Shengjie Sun
,
Yali Du
,
Xiu Li
✉
PDF
Cite
Project
Humanoid Whole-Body Locomotion on Narrow Terrain via Dynamic Balance and Reinforcement Learning.
under review
we propose a novel whole-body locomotion algorithm based on dynamic balance and Reinforcement Learning (RL) that enables humanoid robots to traverse extreme terrains, particularly narrow pathways and unexpected obstacles, using only proprioception.
Weiji Xie
,
Chenjia Bai
✉
,
Jiyuan Shi
,
Junkai Yang
,
Yunfei Ge
,
Weinan Zhang
✉
,
Xuelong Li
PDF
Cite
Project
公众号
Information-Theoretic Reward Decomposition for Generalizable RLHF.
under review
We decompose the reward value in RLHF into two independent components that consists prompt-free reward and prompt-related reward, and propose a new reward learning algorithm by prioritizing data samples based on their prompt-free reward values.
Liyuan Mao
,
Haoran Xu
,
Amy Zhang
,
Weinan Zhang
✉
,
Chenjia Bai
✉
PDF
Cite
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
under review
We propose Adversarial Locomotion and Motion Imitation (ALMI) for humanoid robots, which serves as a novel framework for loco-manipulation tasks, enabling adversarial policy learning between upper and lower body.
Jiyuan Shi
,
Xinzhe Liu
,
Dewei Wang
,
Ouyang Lu
,
Sören Schwertfeger
,
Fuchun Sun
,
Chenjia Bai
✉
,
Xuelong Li
✉
PDF
Cite
Project
公众号
Cite
×