Control systems for autonomous robots often use an architecture known as behaviour-based , which means that the problem of defining what the robot does is broken down into a number of competing or cooperating modules, or behaviours. Although a single behaviour might have access to all sensory information and might be able to control all effectors, it doesn’t necessarily do so all of the time, or may have its control outputs adjusted by other behaviours. The behaviour-based approach has been remarkably successful because the resulting control systems are fast and robust, in comparison with deliberative approaches used in the past, which tended to yield robots that were slow and sensitive to changes in the environment. Our experience has been that, although it is often easy to develop behaviours that work, they tend to be inefficient. They are inefficient in the sense that the robot takes more sense-decide-act cycles than necessary. We address this problem by developing a general method for generating near optimal behaviours based on a reward function. The approach is based on using a Monte Carlo algorithm  for solving Markov Decision Processes to learn the behaviour. Monte Carlo algorithms are a subclass of Reinforcement Learning algorithms and bears similarities to Q-learning or TD(A). This algorithm is slow to converge and so we found it necessary to train using a simulator. The level of realism in the simulator is therefore quite important. We found that we were able to improve over hand-coded behaviours and that the improvement carried over to tests on the physical robot.