Chenjia Bai
Chenjia Bai
Home
Book
Publications
Team
Join us
Light
Dark
Automatic
Article-Journal
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning.
In
SCIENCE CHINA Information Sciences
, 2023
Our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning.
Changhong Wang
,
Xudong Yu
,
Chenjia Bai
,
Qiaosheng Zhang
,
Zhen Wang
✉
PDF
Cite
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness.
In
Journal of Artificial Intelligence Research (
JAIR
)
, 2023
We propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation.
Xiaoyu Wen
,
Xudong Yu
,
Rui Yang
,
Chenjia Bai
✉
,
Zhen Wang
PDF
Cite
Skill Matters: Dynamic Skill Learning for Multi-Agent Cooperative Reinforcement Learning.
Neural Networks
, 2024
We propose a novel Dynamic Skill Learning (DSL) framework to enable more effective adaptation and collaboration in complex tasks.
Tong Li
,
Chenjia Bai
✉
,
Kang Xu
,
Chen Chu
,
Peican Zhu
,
Zhen Wang
✉
PDF
Cite
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models.
In
Transactions on Machine Learning Research (
TMLR
)
, 2025
we propose a novel world model for MARL that learns decentralized local dynamics for scalability, combined with a centralized representation aggregation from all agents.
Yang Zhang
,
Chenjia Bai
✉
,
Bin Zhao
,
Junchi Yan
,
Xiu Li
✉
,
Xuelong Li
PDF
Cite
大模型驱动的具身智能:发展与挑战
中国科学:信息科学
We give a comprehensive survey for embodied AI driven by large-scale models.
Chenjia Bai
,
Huazhe Xu
,
Xuelong Li
✉
PDF
Cite
公众号
Distributional Off-Policy Evaluation in Reinforcement Learning
In
Journal of the American Statistical Association (
JASA
)
, 2025
This paper proposes an offline Wasserstein-based approach to estimate the joint distribution of multivariate discounted cumulative rewards, establishes finite sample error bounds in the batch setting, and demonstrates its superior performance through extensive numerical studies.
Zhengling Qi
,
Chenjia Bai
,
Zhaoran Wang
,
Lan Wang
✉
PDF
Cite
«
Cite
×