TY - GEN
T1 - Diffusion policies for risk-averse behavior modeling in offline reinforcement learning
AU - Chen, Xiaocong
AU - Wang, Siyu
AU - Yu, Tong
AU - Yao, Lina
PY - 2025
Y1 - 2025
N2 - Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.
AB - Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.
UR - https://www.scopus.com/pages/publications/105029968345
U2 - 10.1109/IROS60139.2025.11245819
DO - 10.1109/IROS60139.2025.11245819
M3 - Conference proceeding contribution
AN - SCOPUS:105029968345
SN - 9798331543945
SP - 567
EP - 574
BT - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
A2 - Laugier, Christian
A2 - Renzaglia, Alessandro
A2 - Atanasov, Nikolay
A2 - Birchfield, Stan
A2 - Cielniak, Grzegorz
A2 - De Mattos, Leonardo
A2 - Fiorini, Laura
A2 - Giguère, Philippe
A2 - Hashimoto, Kenji
A2 - Ibanez-Guzman, Javier
A2 - Kamegawa, Tetsushi
A2 - Lee, Jinoh
A2 - Loianno, Giuseppe
A2 - Luck, Kevin
A2 - Maruyama, Hisataka
A2 - Martinet, Philippe
A2 - Moradi, Hadi
A2 - Nunes, Urbano
A2 - Pettre, Julien
A2 - Pretto, Alberto
A2 - Ranzani, Tommaso
A2 - Rönnau, Arne
A2 - Rossi, Silvia
A2 - Rouse, Elliott
A2 - Ruggiero, Fabio
A2 - Simonin, Olivier
A2 - Wang, Danwei
A2 - Yang, Ming
A2 - Yoshida, Eiichi
A2 - Zhao, Huijing
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
T2 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025
Y2 - 19 October 2025 through 25 October 2025
ER -