Chenjia Bai
Chenjia Bai
Home
Book
Publications
Team
Join us
Light
Dark
Automatic
Publications
Type
Journal article
Conference
Under-Review
Date
2024
2023
2022
2021
2020
2019
0001
Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control.
under review
We develop a learning framework combining offline diffusion planner and online preference alignment with weak preference labeling for legged locomotion control.
Xinyi Yuan
,
Zhiwei Shang
,
Zifan Wang
,
Chenkai Wang
,
Zhao Shan
,
Zhenchao Qi
,
Meixin Zhu
✉
,
Chenjia Bai
✉
,
Xuelong Li
PDF
Cite
Project
Radiology Report Generation via Multi-objective Preference Optimization.
In
AAAI Conference on Artificial Intelligence (
AAAI
)
, 2025
We propose a new radiology report generation method that aligns the pre-trained model with multiple human preferences via preference-guided multi-objective optimization reinforcement learning.
Ting Xiao
,
Lei Shi
,
Peng Liu
,
Zhe Wang
,
Chenjia Bai
✉
PDF
Cite
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies.
In
AAAI Conference on Artificial Intelligence (
AAAI
)
, 2025
We propose Forward KL regularized Preference optimization for aligning Diffusion policies to align the diffusion policy with preferences, learning to align the policy output with human intents in various tasks.
Zhao Shan
,
Chenyou Fan
,
Shuang Qiu
,
Jiyuan Shi
,
Chenjia Bai
✉
PDF
Cite
SelfBC: Self Behavior Cloning for Offline Reinforcement Learning .
In
European Conference on Artificial Intelligence (
ECAI
)
, 2024
We propose a novel dynamic policy constraint that restricts the learned policy on the samples generated by the exponentional moving average of previously learned policies for offline RL.
Shirong Liu
,
Chenjia Bai
,
Zixian Guo
,
Hao Zhang
,
Gaurav Sharma
,
Yang Liu
✉
PDF
Cite
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration.
under review
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
Yang Zhang
,
Shixin Yang
,
Chenjia Bai
✉
,
Fei Wu
,
Xiu Li
,
Xuelong Li
,
Zhen Wang
PDF
Cite
Project
公众号
Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning.
In
International Conference on Machine Learning (
ICML
)
, 2024
We propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains.
Xiaoyu Wen
,
Chenjia Bai
✉
,
Kang Xu
,
Xudong Yu
,
Yang Zhang
,
Xuelong Li
,
Zhen Wang
PDF
Cite
Code
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation.
In
International Conference on Machine Learning (
ICML
)
, 2024
We propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning.
Junjie Zhang
,
Chenjia Bai
✉
,
Haoran He
,
Zhigang Wang
,
Bin Zhao
,
Xiu Li
,
Xuelong Li
PDF
Cite
Code
Project
公众号
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning.
In
Artificial Intelligence (under review)
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS).
Qiaosheng Zhang
,
Chenjia Bai
,
Shuyu Hu
,
Zhen Wang
✉
,
Xuelong Li
✉
PDF
Cite
Constrained Ensemble Exploration for Unsupervised Skill Discovery.
In
International Conference on Machine Learning (
ICML
)
, 2024
We propose a novel unsupervised RL framework via an ensemble of skills, where each skill performs partition exploration based on the state prototypes.
Chenjia Bai
,
Rushuai Yang
,
Qiaosheng Zhang
,
Kang Xu
,
Yi Chen
,
Ting Xiao
,
Xuelong Li
PDF
Cite
Code
Regularized Conditional Diffusion Model for Multi-Task Preference Alignment.
In
Neural Information Processing Systems (
NeurIPS
)
, 2024
We adopt multi-task preferences as a unified condition for both single- and multi-task decision-making, and propose preference representations aligned with preference labels.
Xudong Yu
,
Chenjia Bai
✉
,
Haoran He
,
Changhong Wang
,
Xuelong Li
PDF
Cite
How Does Goal Relabeling Improve Sample Efficiency?
In
International Conference on Machine Learning (
ICML
)
, 2024
We construct an example to show the information-theoretical improvement in sample efficiency achieved by goal relabeling and develop an RL algorithm called
GOALIVE
.
Sirui Zheng
,
Chenjia Bai
,
Zhuoran Yang
,
Zhaoran Wang
PDF
Cite
Cross-Domain Policy Adaptation by Capturing Representation Mismatch.
In
International Conference on Machine Learning (
ICML
)
, 2024
We consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain, and one can get access to sufficient source domain data, while can only have limited interactions with the target domain.
Jiafei Lyu
,
Chenjia Bai
,
Jing-Wen Yang
,
Zongqing Lu
,
Xiu Li
PDF
Cite
Code
Skill Matters: Dynamic Skill Learning for Multi-Agent Cooperative Reinforcement Learning.
Neural Networks
, 2024
We propose a novel Dynamic Skill Learning (DSL) framework to enable more effective adaptation and collaboration in complex tasks.
Tong Li
,
Chenjia Bai
✉
,
Kang Xu
,
Chen Chu
,
Peican Zhu
,
Zhen Wang
✉
PDF
Cite
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
In
IEEE International Conference on Robotics and Automation (
ICRA
)
, 2024
Oral
We consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion.
Jiyuan Shi
,
Chenjia Bai
✉
,
Haoran He
,
Lei Han
,
Dong Wang
,
Bin Zhao
,
Mingguo Zhao
,
Xiu Li
,
Xuelong Li
PDF
Cite
Project
OVD-Explorer: Optimism should not be the Sole Pursuit of Exploration in Noisy Environments.
In
AAAI Conference on Artificial Intelligence (
AAAI
)
, 2024
We propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control.
Jinyi Liu
,
Zhi Wang
,
Yan Zheng
,
Jianye Hao
,
Chenjia Bai
,
Junjie Ye
,
Zhen Wang
,
Et Al.
PDF
Cite
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training.
In
Neural Information Processing Systems (
NeurIPS
)
, 2024
We introduce a novel framework that leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos.
Haoran He
,
Chenjia Bai
✉
,
Ling Pan
,
Weinan Zhang
,
Bin Zhao
,
Xuelong Li
PDF
Cite
Project
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness.
Journal of Artificial Intelligence Research (under review)
, 2023
We propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation.
Xiaoyu Wen
,
Xudong Yu
,
Rui Yang
,
Chenjia Bai
✉
,
Zhen Wang
PDF
Cite
On the Value of Myopic Behavior in Policy Reuse.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 (under review)
We present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks.
Kang Xu
,
Chenjia Bai
✉
,
Shuang Qiu
,
Haoran He
,
Bin Zhao
,
Zhen Wang
,
Wei Li
,
Xuelong Li
PDF
Cite
大模型驱动的具身智能:发展与挑战
中国科学:信息科学
We give a comprehensive survey for embodied AI driven by large-scale models.
Chenjia Bai
,
Huazhe Xu
,
Xuelong Li
✉
PDF
Cite
公众号
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning.
In
Neural Information Processing Systems (
NeurIPS
)
, 2023
We aim to investigate the effectiveness of a single diffusion model in modeling large-scale multi-task offline data, which can be challenging due to diverse and multimodal data distribution.
Haoran He
,
Chenjia Bai
✉
,
Kang Xu
,
Zhuoran Yang
,
Weinan Zhang
,
Dong Wang
,
Bin Zhao
,
Xuelong Li
PDF
Cite
Cross-Domain Policy Adaptation via Value-Guided Data Filtering.
In
Neural Information Processing Systems (
NeurIPS
)
, 2023
We reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains.
Kang Xu
,
Chenjia Bai
✉
,
Xiaoteng Ma
,
Dong Wang
,
Bin Zhao
,
Zhen Wang
,
Xuelong Li
,
Wei Li
PDF
Cite
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning.
In
SCIENCE CHINA Information Sciences
, 2023
Our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning.
Changhong Wang
,
Xudong Yu
,
Chenjia Bai
,
Qiaosheng Zhang
,
Zhen Wang
✉
PDF
Cite
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning.
In
Artificial Intelligence (
AIJ
)
, 2023
We propose an uncertainty-based MTDS approach that shares the entire dataset without data selection.
Chenjia Bai
,
Lingxiao Wang
,
Jianye Hao
,
Zhuoran Yang
,
Bin Zhao
,
Zhen Wang
✉
,
Xuelong Li
✉
PDF
Cite
Code
公众号
Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner.
under review
We develop a versatile diffusion planner that can leverage large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks.
Chenyou Fan
,
Chenjia Bai
✉
,
Zhao Shan
,
Haoran He
,
Yang Zhang
,
Zhen Wang
PDF
Cite
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
In
Neural Information Processing Systems (
NeurIPS
)
, Datasets and Benchmarks Track, 2024
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods where one needs to transfer policies across different domains with dynamics mismatch.
Jiafei Lyu
,
Kang Xu
,
Jiacheng Xu
,
Mengbei Yan
,
Jing-Wen Yang
,
Zongzhang Zhang
,
Chenjia Bai
✉
,
Zongqing Lu
✉
,
Xiu Li
✉
PDF
Cite
Code
Behavior Contrastive Learning for Unsupervised Skill Discovery.
In
International Conference on Machine Learning (
ICML
)
, 2023
We propose a novel unsupervised skill discovery method through contrastive learning among behaviors, which makes the agent produce similar behaviors for the same skill and diverse behaviors for different skills.
Rushuai Yang
,
Chenjia Bai
✉
,
Hongyi Guo
,
Siyuan Li
,
Bin Zhao
,
Zhen Wang
,
Peng Liu
,
Xuelong Li
PDF
Cite
Code
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning.
In
Information Sciences
, 2023
We introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of Q-values.
Xudong Yu
,
Chenjia Bai
✉
,
Hongyi Guo
,
Changhong Wang
✉
,
Zhen Wang
PDF
Cite
False Correlation Reduction for Offline Reinforcement Learning.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI
)
, 2023
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
Zhihong Deng
,
Zuyue Fu
,
Lingxiao Wang
,
Zhuoran Yang
,
Chenjia Bai
,
Tianyi Zhou
,
Jing Jiang
PDF
Cite
Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective.
In
Annual Conference on Robot Learning (
CORL
)
, 2024
Oral
We propose a novel single-stage privileged knowledge distillation method called the Historical Information Bottleneck (HIB) to narrow the sim-to-real gap for legged locomotion.
Haoran He
,
Peilin Wu
,
Chenjia Bai
,
Hang Lai
,
Lingxiao Wang
,
Ling Pan,
,
Xiaolin Hu
,
Weinan Zhang
✉
PDF
Cite
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
In
Neural Information Processing Systems (
NeurIPS
)
, 2022
Spotlight
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
Rui Yang
✉
,
Chenjia Bai
✉
,
Xiaoteng Ma
,
Zhaoran Wang
,
Chongjie Zhang
,
Lei Han
PDF
Cite
Self-Supervised Imitation for Offline Reinforcement Learning with Hindsight Relabeling.
IEEE Transactions on Systems, Man, and Cybernetics: Systems
. 2022
We present an offline RL algorithm that combines hindsight relabeling and supervised regression to predict actions without oracle information.
Xudong Yu
,
Chenjia Bai
,
Changhong Wang
,
Dengxiu Yu
,
C. L. Philip Chen
,
Zhen Wang
✉
PDF
Cite
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning.
In
International Conference on Machine Learning (
ICML
)
, 2022
Spotlight
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. For both models, we propose to extract the correct feature representations of the low-rank model by minimizing a contrastive loss.
Shuang Qiu
,
Lingxiao Wang
,
Chenjia Bai
,
Zhuoran Yang
,
Zhaoran Wang
PDF
Cite
Code
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
IEEE Transactions on Neural Networks and Learning Systems
, 2022
We propose monotonic quantile network (MQN) with conservative quantile regression (CQR) for risk-averse policy learning.
Chenjia Bai
,
Ting Xiao
,
Zhoufan Zhu
,
Lingxiao Wang
,
Fan Zhou
,
Peng Liu
PDF
Cite
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
IEEE Transactions on Neural Networks and Learning Systems
, 2022
We conduct a comprehensive survey on existing exploration methods for both single-agent RL and multiagent RL.
Jianye Hao
,
Tianpei Yang
,
Hongyao Tang
,
Chenjia Bai
,
Jinyi Liu
,
Zhaopeng Meng
,
Peng Liu
,
Zhen Wang
PDF
Cite
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
International Conference on Learning Representations (
ICLR
)
, 2022
Spotlight
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
Chenjia Bai
,
Lingxiao Wang
,
Zhuoran Yang
,
Zhihong Deng
,
Animesh Garg
,
Peng Liu
,
Zhaoran Wang
PDF
Cite
Code
Dynamic Bottleneck for Robust Self-Supervised Exploration.
In
Neural Information Processing Systems (
NeurIPS
)
, 2021
We propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle.
Chenjia Bai
,
Lingxiao Wang
,
Lei Han
,
Animesh Garg
,
Jianye Hao
,
Peng Liu
,
Zhaoran Wang
PDF
Cite
Code
Principled Exploration via Optimistic Bootstrapping and Backward Induction.
In
International Conference on Machine Learning (
ICML
)
, 2021
Spotlight
We propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).
Chenjia Bai
,
Lingxiao Wang
,
Lei Han
,
Jianye Hao
,
Animesh Garg
,
Peng Liu
,
Zhaoran Wang
PDF
Cite
Code
Addressing Hindsight Bias in Multi-Goal Reinforcement Learning.
IEEE Transactions on Cybernetics
, 2021
We analyze the hindsight bias due to this use of hindsight goals and propose the bias-corrected HER (BHER), an efficient algorithm that corrects the hindsight bias in training.
Chenjia Bai
,
Lingxiao Wang
,
Yixin Wang
,
Rui Zhao
,
Chenyao Bai
,
Peng Liu
PDF
Cite
Code
Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning.
IEEE Transactions on Neural Networks and Learning Systems
, 2021
We propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity.
Chenjia Bai
,
Peng Liu
,
Kaiyu Liu
,
Lingxiao Wang
,
Yingnan Zhao
,
Lei Han
PDF
Cite
Code
Project
Generating Attentive Goals for Prioritized Hindsight Reinforcement Learning.
Knowledge-Based Systems (KBS)
, 2020
We propose a novel prioritized hindsight model for multi-goal RL in which the agent is provided with more valuable goals, as measured by the expected temporal-difference (TD) error.
Peng Liu
,
Chenjia Bai
,
Yingnan Zhao
,
Chenyao Bai
,
Wei Zhao
,
Xianglong Tang
PDF
Cite
Obtaining Accurate Estimated Action Values in Categorical Distributional Reinforcement Learning.
Knowledge-Based Systems (KBS)
, 2020
This paper describes a method of obtaining more accurate estimated action values for CDRL using adaptive bounds.
Yingnan Zhao
,
Peng Liu
,
Chenjia Bai
,
Wei Zhao
,
Xianglong Tang
PDF
Cite
Active Sampling for Deep Q-learning Based on TD-error Adaptive Correction.
Journal of Computer Research and Development (in Chinese)
, 2019
We propose an active sampling method based on TD-error adaptive correction in order to solve sample efficiency problem in deep Q-learning.
Chenjia Bai
,
Peng Liu
,
Wei Zhao
,
Xianglong Tang
PDF
Cite
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models.
under review
we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents.
Yang Zhang
,
Chenjia Bai
✉
,
Bin Zhao
,
Junchi Yan
,
Xiu Li
,
Xuelong Li
PDF
Cite
Cite
×