Skip to main navigation Skip to search Skip to main content

Improving efficiency of unsupervised skill discovery by model resetting curriculum

Yuanjiang Cao*, Yao Liu, Ruoyu Wang, Quan Z. Sheng, Lina Yao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Unsupervised skill discovery is a fundamental task for an agent to acquire optimal behaviours independently without relying on external rewards or supervision for specific tasks. Previous research has been conducted that aims to distil skills from information theory-guided exploration without supervision. However, the training stage of unsupervised skill discovery still requires a large number of samples. One approach to decrease the number of samples is to inject plasticity by resetting the neural network after a constant number of iterations. Counter-intuitively, we find that naive model resetting could compromise the model’s efficacy and damage sample efficiency in unsupervised skill discovery tasks. To address this problem, we propose a new concept, Reward Difference Rate, and leverage it to construct three categories of learning curves during the unsupervised skill discovery training process. The reward Difference Rate is able to identify the failure cases of naive resetting. Based on the identification, we propose substituting the naive resetting model with a novel Model Resetting Curriculum scheme. We conduct experiments on a Mujoco-based environment compared with advanced baselines, targeting two continuous skill domains, Ant and Humanoid. The experiment result demonstrates the effectiveness of our proposed method in cutting training costs in terms of the number of environment interactions.

Original languageEnglish
Title of host publicationNeural Information Processing
Subtitle of host publication31st International Conference, ICONIP 2024, Auckland, New Zealand, December 2–6, 2024, proceedings, part III
EditorsMufti Mahmud, Maryam Doborjeh, Kevin Wong, Andrew Chi Sing Leung, Zohreh Doborjeh, M. Tanveer
Place of PublicationSingapore
PublisherSpringer, Springer Nature
Pages1-14
Number of pages14
ISBN (Electronic)9789819669547
ISBN (Print)9789819669530
DOIs
Publication statusPublished - 2025
Event31st International Conference on Neural Information Processing, ICONIP 2024 - Auckland, New Zealand
Duration: 2 Dec 20246 Dec 2024

Publication series

NameCommunications in Computer and Information Science
PublisherSpringer
Volume2284
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference31st International Conference on Neural Information Processing, ICONIP 2024
Country/TerritoryNew Zealand
CityAuckland
Period2/12/246/12/24

Keywords

  • Unsupervised Skill Discovery
  • Sample Efficiency
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'Improving efficiency of unsupervised skill discovery by model resetting curriculum'. Together they form a unique fingerprint.

Cite this