Biography


I am a Researcher at Shanghai AI Laboratory, working for AI-diven embodied agents and large-scale decision-making systems, directed by Prof. Xuelong Li. My research mainly focuses on deep Reinforcement Learning (RL) and Embodied AI, including diffusion/transformer embodied system, Preference Learning, offline RL, robust RL, efficient exploration, representation learning, and multi-agent system. I holds a Ph.D. degree in Computer Science from Harbin Institute of Technology (HIT), advised by Prof. Peng Liu.

I am fortunate to have been collaborated with many fantastic researchers. I was a visiting student at University of Toronto and Vector Institute, working with Prof. Animesh Garg. I also used to be an intern at Huawei Noah’s Ark Lab (advised by Prof. Jianye Hao), Tencent Robotics X (advised by Dr. Lei Han), and Alibaba. I received my Bachelor’s degree and Master’s degree in Computer Science from HIT.

Interests
  • Deep Reinforcement Learning
  • Embodied AI
  • Foundation Model for Decision Making
Education
  • PhD in Computer Science, 2022

    Harbin Institute of Technology

  • MEng in Computer Science, 2017

    Harbin Institute of Technology

  • BSc in Computer Science, 2015

    Harbin Institute of Technology

Publications

"✉" denotes corresponding author
Quickly discover relevant content by filtering publications.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
In IEEE International Conference on Robotics and Automation (ICRA), 2024
We consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
On the Value of Myopic Behavior in Policy Reuse.
IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 (under review)
We present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks.
On the Value of Myopic Behavior in Policy Reuse.
False Correlation Reduction for Offline Reinforcement Learning.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
False Correlation Reduction for Offline Reinforcement Learning.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
In Neural Information Processing Systems (NeurIPS), 2022     Spotlight
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We propose monotonic quantile network (MQN) with conservative quantile regression (CQR) for risk-averse policy learning.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We conduct a comprehensive survey on existing exploration methods for both single-agent RL and multiagent RL.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
International Conference on Learning Representations (ICLR), 2022     Spotlight
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
In Neural Information Processing Systems (NeurIPS), 2021
We propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
Principled Exploration via Optimistic Bootstrapping and Backward Induction.
In International Conference on Machine Learning (ICML), 2021     Spotlight
We propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).
Principled Exploration via Optimistic Bootstrapping and Backward Induction.

Talks

Service

  • Senior Program Committee Member (SPC) / Area Chair (AC) of AAMAS (2024 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of NeurIPS (2021 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of ICLR (2021 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of ICML (2022 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of AAAI (2021 - 2024)
  • Program Committee Members (PC) / Conference Reviewer of ECAI (2023 - 2024)
  • Journal Reviewer: IEEE Trans. Cybernetics, IEEE Trans. TNNLS, IEEE Trans. TETCI, IEEE Trans. Intelligent Vehicles

Experience

 
 
 
 
 
Researcher
Shanghai AI Laboratory
2022 – Present China
 
 
 
 
 
Joint PhD Student
University of Toronto
2021 – 2022 Canada
 
 
 
 
 
PhD Student
Harbin Institute of Technology
2017 – 2022 China