Sparse gradient-based direct policy search

Nataliya Sokolovska

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution


Reinforcement learning is challenging if state and action spaces are continuous. The discretization of state and action spaces and real-time adaptation of the discretization are critical issues in reinforcement learning problems. In our contribution we consider the adaptive discretization, and introduce a sparse gradient-based direct policy search method. We address the issue of efficient states/actions selection in the gradient-based direct policy search based on imposing sparsity through the L 1 penalty term. We propose to start learning with a fine discretization of state space and to induce sparsity via the L 1 norm. We compare the proposed approach to state-of-the art methods, such as progressive widening Q-learning which updates the discretization of the states adaptively, and to classic as well as sparse Q-learning with linear function approximation. We demonstrate by our experiments on standard reinforcement learning challenges that the proposed approach is efficient.
Original languageEnglish
Title of host publicationNeural information processing
Subtitle of host publication19th international conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012 : proceedings
EditorsTingwen Huang, Zhigang Zeng, Chuandong Li, Chi Sing Leung
Place of PublicationHeidelberg, Germany
PublisherSpringer, Springer Nature
Number of pages10
ISBN (Print)9783642344770
Publication statusPublished - 2012
EventInternational Conference on Neural Information Processing (19th : 2012) - Doha, Qatar, Qatar
Duration: 12 Nov 201215 Nov 2012

Publication series

NameLecture notes in computer science
ISSN (Print)0302-9743


ConferenceInternational Conference on Neural Information Processing (19th : 2012)
CityDoha, Qatar


  • Direct policy search
  • model selection
  • Q-learning

Fingerprint Dive into the research topics of 'Sparse gradient-based direct policy search'. Together they form a unique fingerprint.

Cite this