RePreM: representation pre-training with masked model for reinforcement learning

Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang*

*Corresponding author for this work

Research output: Contribution to journalConference paperpeer-review

1 Citation (Scopus)

Abstract

Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory. RePreM is simple but effective compared to existing representation pre-training methods in RL. It avoids algorithmic sophistication (such as data augmentation or estimating multiple models) with sequence modeling and generates a representation that captures long-term dynamics well. Empirically, we demonstrate the effectiveness of RePreM in various tasks, including dynamic prediction, transfer learning, and sample-efficient RL with both value-based and actor-critic methods. Moreover, we show that RePreM scales well with dataset size, dataset quality, and the scale of the encoder, which indicates its potential towards big RL models.
Original languageEnglish
Pages (from-to)6879-6887
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume37
Issue number6
DOIs
Publication statusPublished - 27 Jun 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: 7 Feb 202314 Feb 2023

Fingerprint

Dive into the research topics of 'RePreM: representation pre-training with masked model for reinforcement learning'. Together they form a unique fingerprint.

Cite this