TY - GEN
T1 - Automatic development of robot behaviour using Monte Carlo methods
AU - Brusey, James
PY - 2000
Y1 - 2000
N2 - Control systems for autonomous robots often use an architecture known as behaviour-based [1], which means that the problem of defining what the robot does is broken down into a number of competing or cooperating modules, or behaviours. Although a single behaviour might have access to all sensory information and might be able to control all effectors, it doesn’t necessarily do so all of the time, or may have its control outputs adjusted by other behaviours. The behaviour-based approach has been remarkably successful because the resulting control systems are fast and robust, in comparison with deliberative approaches used in the past, which tended to yield robots that were slow and sensitive to changes in the environment. Our experience has been that, although it is often easy to develop behaviours that work, they tend to be inefficient. They are inefficient in the sense that the robot takes more sense-decide-act cycles than necessary. We address this problem by developing a general method for generating near optimal behaviours based on a reward function. The approach is based on using a Monte Carlo algorithm [2] for solving Markov Decision Processes to learn the behaviour. Monte Carlo algorithms are a subclass of Reinforcement Learning algorithms and bears similarities to Q-learning or TD(A). This algorithm is slow to converge and so we found it necessary to train using a simulator. The level of realism in the simulator is therefore quite important. We found that we were able to improve over hand-coded behaviours and that the improvement carried over to tests on the physical robot.
AB - Control systems for autonomous robots often use an architecture known as behaviour-based [1], which means that the problem of defining what the robot does is broken down into a number of competing or cooperating modules, or behaviours. Although a single behaviour might have access to all sensory information and might be able to control all effectors, it doesn’t necessarily do so all of the time, or may have its control outputs adjusted by other behaviours. The behaviour-based approach has been remarkably successful because the resulting control systems are fast and robust, in comparison with deliberative approaches used in the past, which tended to yield robots that were slow and sensitive to changes in the environment. Our experience has been that, although it is often easy to develop behaviours that work, they tend to be inefficient. They are inefficient in the sense that the robot takes more sense-decide-act cycles than necessary. We address this problem by developing a general method for generating near optimal behaviours based on a reward function. The approach is based on using a Monte Carlo algorithm [2] for solving Markov Decision Processes to learn the behaviour. Monte Carlo algorithms are a subclass of Reinforcement Learning algorithms and bears similarities to Q-learning or TD(A). This algorithm is slow to converge and so we found it necessary to train using a simulator. The level of realism in the simulator is therefore quite important. We found that we were able to improve over hand-coded behaviours and that the improvement carried over to tests on the physical robot.
UR - http://www.scopus.com/inward/record.url?scp=84942883025&partnerID=8YFLogxK
M3 - Conference proceeding contribution
AN - SCOPUS:84942883025
SN - 3540679251
SN - 9783540679257
SN - 9783540679257
VL - 1886
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
BT - PRICAI 2000, Topics in Artificial Intelligence - 6th Pacific Rim International Conference on Artificial Intelligence, Proceedings
PB - Springer, Springer Nature
CY - Berlin ; New York
T2 - 6th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2000
Y2 - 28 August 2000 through 1 September 2000
ER -