Reinforcement learning for real-world control applications

Mark Pendrith, Malcolm Ryan

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

If reinforcement learning (RL) techniques are to be used for “real world” dynamic system control, the problems of noise and plant disturbance will have to be addressed, along with various issues resulting from learning in non-Markovian settings. We present experimental results from three domains: A simulated noisy pole-and-cart system, an artificial non-Markovian decision problem, and a real six-legged walking robot. The results from each of these domains suggest that that actual return (Monte Carlo) approaches to the credit-assignment problem may be more suited than temporal difference (TD) methods for many real- world control applications. A new algorithm we call C-Trace, a variant of the P-Trace RL algorithm is introduced, and some possible advantages of using algorithms of this type are discussed.

Original languageEnglish
Title of host publicationAdvances in Artificial Intelligence - 11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI 1996, Proceedings
Place of PublicationBerlin; Heidelberg
PublisherSpringer, Springer Nature
Pages257-270
Number of pages14
Volume1081
ISBN (Print)3540612912, 9783540612919
DOIs
Publication statusPublished - 1996
Externally publishedYes
Event11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI 1996 - Toronto, Canada
Duration: 21 May 199624 May 1996

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1081
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI 1996
Country/TerritoryCanada
CityToronto
Period21/05/9624/05/96

Fingerprint

Dive into the research topics of 'Reinforcement learning for real-world control applications'. Together they form a unique fingerprint.

Cite this