Sparse gradient-based direct policy search

Nataliya Sokolovska

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

Reinforcement learning is challenging if state and action spaces are continuous. The discretization of state and action spaces and real-time adaptation of the discretization are critical issues in reinforcement learning problems. In our contribution we consider the adaptive discretization, and introduce a sparse gradient-based direct policy search method. We address the issue of efficient states/actions selection in the gradient-based direct policy search based on imposing sparsity through the L 1 penalty term. We propose to start learning with a fine discretization of state space and to induce sparsity via the L 1 norm. We compare the proposed approach to state-of-the art methods, such as progressive widening Q-learning which updates the discretization of the states adaptively, and to classic as well as sparse Q-learning with linear function approximation. We demonstrate by our experiments on standard reinforcement learning challenges that the proposed approach is efficient.
Original languageEnglish
Title of host publicationNeural information processing
Subtitle of host publication19th international conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012 : proceedings
EditorsTingwen Huang, Zhigang Zeng, Chuandong Li, Chi Sing Leung
Place of PublicationHeidelberg, Germany
PublisherSpringer, Springer Nature
Pages212-221
Number of pages10
Volume4
ISBN (Print)9783642344770
DOIs
Publication statusPublished - 2012
EventInternational Conference on Neural Information Processing (19th : 2012) - Doha, Qatar, Qatar
Duration: 12 Nov 201215 Nov 2012

Publication series

NameLecture notes in computer science
PublisherSpringer
Volume7666
ISSN (Print)0302-9743

Conference

ConferenceInternational Conference on Neural Information Processing (19th : 2012)
CountryQatar
CityDoha, Qatar
Period12/11/1215/11/12

    Fingerprint

Keywords

  • Direct policy search
  • model selection
  • Q-learning

Cite this

Sokolovska, N. (2012). Sparse gradient-based direct policy search. In T. Huang, Z. Zeng, C. Li, & C. S. Leung (Eds.), Neural information processing: 19th international conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012 : proceedings (Vol. 4, pp. 212-221). (Lecture notes in computer science; Vol. 7666). Heidelberg, Germany: Springer, Springer Nature. https://doi.org/10.1007/978-3-642-34478-7_27