TY - JOUR
T1 - Generative adversarial reward learning for generalized behavior tendency inference
AU - Chen, Xiaocong
AU - Yao, Lina
AU - Wang, Xianzhi
AU - Sun, Aixin
AU - Sheng, Quan Z.
PY - 2023/10
Y1 - 2023/10
N2 - Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.
AB - Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.
UR - http://www.scopus.com/inward/record.url?scp=85135736282&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2022.3186920
DO - 10.1109/TKDE.2022.3186920
M3 - Article
AN - SCOPUS:85135736282
SN - 1041-4347
VL - 35
SP - 9878
EP - 9889
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 10
ER -