Imitation learning to outperform demonstrators by directly extrapolating demonstrations

Yuanying Cai, Chuheng Zhang, Wei Shen, Xiaonan He, Xuyun Zhang, Longbo Huang

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

1 Citation (Scopus)
43 Downloads (Pure)

Abstract

We consider the problem of imitation learning from suboptimal demonstrations that aims to learn a better policy than demonstrators. Previous methods usually learn a reward function to encode the underlying intention of the demonstrators and use standard reinforcement learning to learn a policy based on this reward function. Such methods can fail to control the distribution shift between demonstrations and the learned policy since the learned reward function may not generalize well on out-of-distribution samples and can mislead the agent to highly uncertain states, resulting in degenerated performance. To address this limitation, we propose a novel algorithm called Outperforming demonstrators by Directly Extrapolating Demonstrations(ODED). Instead of learning a reward function, ODED trains an ensemble of extrapolation networks that generate extrapolated demonstrations, i.e., demonstrations that may be induced by a good agent, based on provided demonstrations. With these extrapolated demonstrations, we can use an off-the-shelf imitation learning algorithm to learn a good policy. Guided by extrapolated demonstrations, the learned policy avoids visiting highly uncertain states and therefore controls the distribution shift. Empirically, we show that ODED outperforms suboptimal demonstrators and achieves better performance than state-of-the-art imitation learning algorithms on the MuJoCo and DeepMind Control Suite tasks.

Original languageEnglish
Title of host publicationCIKM '22
Subtitle of host publicationproceedings of the 31st ACM International Conference on Information and Knowledge Management
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages128-137
Number of pages10
ISBN (Electronic)9781450392365
DOIs
Publication statusPublished - 17 Oct 2022
Event31st ACM International Conference on Information and Knowledge Management, CIKM 2022 - Atlanta, United States
Duration: 17 Oct 202221 Oct 2022

Conference

Conference31st ACM International Conference on Information and Knowledge Management, CIKM 2022
Country/TerritoryUnited States
CityAtlanta
Period17/10/2221/10/22

Bibliographical note

Copyright the Author(s) 2022. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Reinforcement Learning
  • Imitation Learning
  • Distribution Shift

Fingerprint

Dive into the research topics of 'Imitation learning to outperform demonstrators by directly extrapolating demonstrations'. Together they form a unique fingerprint.

Cite this