Biography


I am a Research Scientist at Institute of Artificial Intelligence (TeleAI), China Telecom, specialized in the cutting-edge field of Embodied AI and Reinforcement Learning (RL). Our group is dedicated to develop embodied technologies encompassing perception, planning, locomotion, manipulation, and promoting the industrial application of embodied AI. Our group thrives under the leadership of Prof. Xuelong Li, who serves as the dean of TeleAI. Previously, I was a Researcher at Shanghai AI Laboratory, affiliated with IPEC group. My research interests include diffusion/transformer policy, LLM-driven planning, world model, preference learning, RL/MPC-based locomotion, dexterous manipulation, representation learning, sim-to-real, multi-agent collaboration, as well as real-world applications for robot arm, dexterous hand, quadruped robot, and humanoid robot.

I holds a Ph.D. degree in Computer Science from Harbin Institute of Technology (HIT), advised by Prof. Peng Liu. I am fortunate to have been collaborated with many fantastic researchers. I was a joint PhD student at University of Toronto and Vector Institute, working with Prof. Animesh Garg. I also used to be an intern at Huawei Noah’s Ark Lab (advised by Prof. Jianye Hao), Tencent Robotics X (advised by Dr. Lei Han), and Alibaba. I received my Bachelor’s degree and Master’s degree in Computer Science from HIT.

团队招收具身智能方向全职研究人员、实习生、联培博士生,具体详见链接.

Interests
  • Deep Reinforcement Learning
  • Embodied AI
  • Foundation Model for Decision Making
Education
  • PhD in Computer Science, 2017-2022

    Harbin Institute of Technology

  • Joint PhD Program, 2021-2022

    University of Toronto

Publications

"✉" denotes corresponding author
Quickly discover relevant content by filtering publications.
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
In Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2024
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods where one needs to transfer policies across different domains with dynamics mismatch.
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
In IEEE International Conference on Robotics and Automation (ICRA), 2024     Oral
We consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
On the Value of Myopic Behavior in Policy Reuse.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 (under review)
We present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks.
On the Value of Myopic Behavior in Policy Reuse.
False Correlation Reduction for Offline Reinforcement Learning.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
False Correlation Reduction for Offline Reinforcement Learning.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
In Neural Information Processing Systems (NeurIPS), 2022     Spotlight
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We propose monotonic quantile network (MQN) with conservative quantile regression (CQR) for risk-averse policy learning.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We conduct a comprehensive survey on existing exploration methods for both single-agent RL and multiagent RL.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
International Conference on Learning Representations (ICLR), 2022     Spotlight
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
In Neural Information Processing Systems (NeurIPS), 2021
We propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
Principled Exploration via Optimistic Bootstrapping and Backward Induction.
In International Conference on Machine Learning (ICML), 2021     Spotlight
We propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).
Principled Exploration via Optimistic Bootstrapping and Backward Induction.

Talks

Service

  • Senior Program Committee Member (SPC) / Area Chair (AC) of AAMAS (2024 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of NeurIPS (2021 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of ICLR (2021 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of ICML (2022 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of AAAI (2021 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of ECAI (2023 - 2024)
  • Journal Reviewer: IEEE Trans. Cybernetics, IEEE Trans. TNNLS, IEEE Trans. TETCI, IEEE Trans. Intelligent Vehicles

Experience

 
 
 
 
 
Research Scientist
TeleAI, China Telecom
2024 – Present China
 
 
 
 
 
Researcher
Shanghai AI Laboratory
2022 – 2024 China
 
 
 
 
 
Joint PhD Student
University of Toronto
2021 – 2022 Canada
 
 
 
 
 
PhD Student
Harbin Institute of Technology
2017 – 2022 China