Generative adversarial reward learning for generalized behavior tendency inference

Xiaocong Chen, Lina Yao, Xianzhi Wang, Aixin Sun, Quan Z. Sheng

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
2 Downloads (Pure)

Abstract

Recent advances in reinforcement learning have inspired increasing interest in learning user modeling adaptively through dynamic interactions, e.g., in reinforcement learning based recommender systems. In most reinforcement learning applications, reward functions provide the critical guideline for optimization. However, current reinforcement learning-based methods rely on manually-defined reward functions, which cannot adapt to dynamic, noisy environments. Moreover, they generally use task-specific reward functions that sacrifice generalization ability. We propose a generative inverse reinforcement learning for user behavioral preference modeling to address the above issues. Instead of using predefined reward functions, our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN. Our model provides a general approach to characterizing and explaining underlying behavioral tendencies. Our experiments show our method outperforms state-of-the-art methods in several scenarios, namely traffic signal control, online recommender systems, and scanpath prediction.

Original languageEnglish
Pages (from-to)9878-9889
Number of pages12
JournalIEEE Transactions on Knowledge and Data Engineering
Volume35
Issue number10
Early online date3 Aug 2022
DOIs
Publication statusPublished - Oct 2023

Fingerprint

Dive into the research topics of 'Generative adversarial reward learning for generalized behavior tendency inference'. Together they form a unique fingerprint.

Cite this