Chenjia Bai
Chenjia Bai
Home
Book
Publications
Team
Join us
Light
Dark
Automatic
Conference
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
In
Neural Information Processing Systems (
NeurIPS
)
, Datasets and Benchmarks Track, 2024
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods where one needs to transfer policies across different domains with dynamics mismatch.
Jiafei Lyu
,
Kang Xu
,
Jiacheng Xu
,
Mengbei Yan
,
Jing-Wen Yang
,
Zongzhang Zhang
,
Chenjia Bai
✉
,
Zongqing Lu
✉
,
Xiu Li
✉
PDF
Cite
Code
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies.
In
AAAI Conference on Artificial Intelligence (
AAAI
)
, 2025
We propose Forward KL regularized Preference optimization for aligning Diffusion policies to align the diffusion policy with preferences, learning to align the policy output with human intents in various tasks.
Zhao Shan
,
Chenyou Fan
,
Shuang Qiu
,
Jiyuan Shi
,
Chenjia Bai
✉
PDF
Cite
Radiology Report Generation via Multi-objective Preference Optimization.
In
AAAI Conference on Artificial Intelligence (
AAAI
)
, 2025
We propose a new radiology report generation method that aligns the pre-trained model with multiple human preferences via preference-guided multi-objective optimization reinforcement learning.
Ting Xiao
,
Lei Shi
,
Peng Liu
,
Zhe Wang
,
Chenjia Bai
✉
PDF
Cite
Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning.
In
International Conference on Learning Representations (
ICLR
)
, 2025
We introduce ExpoComm, a scalable communication protocol that leverages exponential topologies for efficient information dissemination among many agents in large-scale multi-agent reinforcement learning.
Xinran Li
,
Xiaolu Wang
,
Chenjia Bai
,
Jun Zhang
PDF
Cite
Discriminator-Guided Embodied Planning for LLM Agent.
In
International Conference on Learning Representations (
ICLR
)
, 2025
We propose a novel framework that generalizes demonstrations to establish critic-regularized grounding and optimization in the long-term planning of LLMs.
Haofu Qian
,
Chenjia Bai
✉
,
Jiatao Zhang
,
Fei Wu
,
Wei Song
,
Xuelong Li
PDF
Cite
Online Preference Alignment for Language Models via Count-based Exploration.
In
International Conference on Learning Representations (
ICLR
)
, 2025
We propose count-based online preference optimization for LLM alignment that leverages coin-flip counting to encourage exploration in online RLHF.
Chenjia Bai
,
Yang Zhang
,
Shuang Qiu
,
Qiaosheng Zhang
,
Kang Xu
,
Xuelong Li
✉
PDF
Cite
Code
公众号
«
Cite
×