Biography


I am a Research Scientist at Institute of Artificial Intelligence (TeleAI), China Telecom and the Director of Embodied AI research center, specialized in the cutting-edge field of Embodied AI and Reinforcement Learning (RL). Our group is dedicated to develop embodied technologies encompassing perception, planning, locomotion, manipulation, and promoting the industrial application of embodied AI. Our group thrives under the leadership of Prof. Xuelong Li, who serves as the dean of TeleAI. Previously, I was a Researcher at Shanghai AI Laboratory, affiliated with IPEC group. My research interests include diffusion/transformer policy, LLM-driven planning, world model, preference learning, RL/MPC-based locomotion, dexterous manipulation, representation learning, sim-to-real, multi-agent collaboration, as well as real-world applications for robot arm, dexterous hand, quadruped robot, and humanoid robot.

I holds a Ph.D. degree in Computer Science from Harbin Institute of Technology (HIT), advised by Prof. Peng Liu. I am fortunate to have been collaborated with many fantastic researchers. I was a joint PhD student at University of Toronto and Vector Institute, working with Prof. Animesh Garg. I also used to be an intern at Huawei Noah’s Ark Lab (advised by Prof. Jianye Hao), Tencent Robotics X (advised by Dr. Lei Han), and Alibaba. I received my Bachelor’s degree and Master’s degree in Computer Science from HIT.

中文简介:白辰甲,博士,现任中国电信人工智能研究院(TeleAI)研究科学家,具身智能团队负责人。研究方向包括具身智能、人形机器人、运动和操作大模型、推理对齐等。在包括AI Journal、TPAMI、NeurIPS等学术会议和期刊上发表论文50余篇,出版专著一部。主持国家自然科学基金、国家重点研发计划课题。入选中国科协青年托举人才、上海市启明星扬帆计划、上海市光启青年人才,获世界人工智能大会优秀论文提名奖,哈工大优秀博士论文奖,并担任多个国际顶级会议和期刊的领域主席和审稿人。

团队招收具身智能方向全职研究人员、实习生、联培博士生,具体详见链接.

Interests
  • Embodied AI
  • Reinforcement Learning
  • Foundation Model for Decision Making
Education
  • PhD in Computer Science, 2017-2022

    Harbin Institute of Technology

  • Joint PhD Program, 2021-2022

    University of Toronto

Publications

"✉" denotes corresponding author
Quickly discover relevant content by filtering publications.
Online Preference Alignment for Language Models via Count-based Exploration.
In International Conference on Learning Representations (ICLR), 2025     Spotlight
We propose count-based online preference optimization for LLM alignment that leverages coin-flip counting to encourage exploration in online RLHF.
Online Preference Alignment for Language Models via Count-based Exploration.
On the Value of Myopic Behavior in Policy Reuse.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
We present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks.
On the Value of Myopic Behavior in Policy Reuse.
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
In Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2024
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods where one needs to transfer policies across different domains with dynamics mismatch.
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
In IEEE International Conference on Robotics and Automation (ICRA), 2024     Oral
We consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
False Correlation Reduction for Offline Reinforcement Learning.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
False Correlation Reduction for Offline Reinforcement Learning.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
In Neural Information Processing Systems (NeurIPS), 2022     Spotlight
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We propose monotonic quantile network (MQN) with conservative quantile regression (CQR) for risk-averse policy learning.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We conduct a comprehensive survey on existing exploration methods for both single-agent RL and multiagent RL.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
International Conference on Learning Representations (ICLR), 2022     Spotlight
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
In Neural Information Processing Systems (NeurIPS), 2021
We propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
Principled Exploration via Optimistic Bootstrapping and Backward Induction.
In International Conference on Machine Learning (ICML), 2021     Spotlight
We propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).
Principled Exploration via Optimistic Bootstrapping and Backward Induction.

Talks

Service

  • Senior Program Committee Member (SPC) / Area Chair (AC) of AAMAS (2024 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of RSS (2024 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of NeurIPS (2021 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ICLR (2021 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ICML (2022 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of AAAI (2021 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ICRA (2024 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ECAI (2023 - 2025)
  • Journal Reviewer: IEEE Trans. Cybernetics, IEEE Trans. TNNLS, IEEE Trans. TETCI, IEEE Trans. Intelligent Vehicles, Pattern Recognition.

Experience

 
 
 
 
 
Research Scientist
TeleAI, China Telecom
2024 – Present China
 
 
 
 
 
Researcher
Shanghai AI Laboratory
2022 – 2024 China
 
 
 
 
 
Joint PhD Student
University of Toronto
2021 – 2022 Canada
 
 
 
 
 
PhD Student
Harbin Institute of Technology
2017 – 2022 China