Abstract
We consider the problem of imitation learning from suboptimal demonstrations that aims to learn a better policy than demonstrators. Previous methods usually learn a reward function to encode the underlying intention of the demonstrators and use standard reinforcement learning to learn a policy based on this reward function. Such methods can fail to control the distribution shift between demonstrations and the learned policy since the learned reward function may not generalize well on out-of-distribution samples and can mislead the agent to highly uncertain states, resulting in degenerated performance. To address this limitation, we propose a novel algorithm called Outperforming demonstrators by Directly Extrapolating Demonstrations(ODED). Instead of learning a reward function, ODED trains an ensemble of extrapolation networks that generate extrapolated demonstrations, i.e., demonstrations that may be induced by a good agent, based on provided demonstrations. With these extrapolated demonstrations, we can use an off-the-shelf imitation learning algorithm to learn a good policy. Guided by extrapolated demonstrations, the learned policy avoids visiting highly uncertain states and therefore controls the distribution shift. Empirically, we show that ODED outperforms suboptimal demonstrators and achieves better performance than state-of-the-art imitation learning algorithms on the MuJoCo and DeepMind Control Suite tasks.
Original language | English |
---|---|
Title of host publication | CIKM '22 |
Subtitle of host publication | proceedings of the 31st ACM International Conference on Information and Knowledge Management |
Place of Publication | New York |
Publisher | Association for Computing Machinery |
Pages | 128-137 |
Number of pages | 10 |
ISBN (Electronic) | 9781450392365 |
DOIs | |
Publication status | Published - 17 Oct 2022 |
Event | 31st ACM International Conference on Information and Knowledge Management, CIKM 2022 - Atlanta, United States Duration: 17 Oct 2022 → 21 Oct 2022 |
Conference
Conference | 31st ACM International Conference on Information and Knowledge Management, CIKM 2022 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 17/10/22 → 21/10/22 |
Bibliographical note
Copyright the Author(s) 2022. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Keywords
- Reinforcement Learning
- Imitation Learning
- Distribution Shift