TY - JOUR
T1 - Direct back propagation neural dynamic programming-based particle swarm optimisation
AU - Lu, Yongzhong
AU - Yan, Danping
AU - Zhang, Jingyu
AU - Levy, David
PY - 2014
Y1 - 2014
N2 - In this paper, we introduce direct back propagation (BP) neural dynamic programming (NDP) into particle swarm optimisation (PSO). Thus, a direct BP NDP inspired PSO algorithm, which we call NDPSO, is proposed. In NDPSO, since direct BP NDP belongs to the class of heuristic dynamic programming algorithms based on model-based adaptive critic designs and often serves as an online learning control paradigm, critic BP neural network is trained to optimise a total reward-to-go objective, namely to balance Bellman's equation, while action BP neural network is used to train the inertia weight, cognitive, and social coefficients so that the critic BP network output can approach an ultimate reward-to-go objective of success. With the collective aid of action-critic BP neural networks, inertia weight, cognitive, and social coefficients become more adaptive. Besides, the NDPSO's mutation mechanism also has greatly improved the dynamic performance of the standard PSO. Empirical experiments are conducted on both unimodal and multimodal benchmark functions. The experimental results demonstrate NDPSO's effectiveness and superiority to many other PSO variants on solving most multimodal problems.
AB - In this paper, we introduce direct back propagation (BP) neural dynamic programming (NDP) into particle swarm optimisation (PSO). Thus, a direct BP NDP inspired PSO algorithm, which we call NDPSO, is proposed. In NDPSO, since direct BP NDP belongs to the class of heuristic dynamic programming algorithms based on model-based adaptive critic designs and often serves as an online learning control paradigm, critic BP neural network is trained to optimise a total reward-to-go objective, namely to balance Bellman's equation, while action BP neural network is used to train the inertia weight, cognitive, and social coefficients so that the critic BP network output can approach an ultimate reward-to-go objective of success. With the collective aid of action-critic BP neural networks, inertia weight, cognitive, and social coefficients become more adaptive. Besides, the NDPSO's mutation mechanism also has greatly improved the dynamic performance of the standard PSO. Empirical experiments are conducted on both unimodal and multimodal benchmark functions. The experimental results demonstrate NDPSO's effectiveness and superiority to many other PSO variants on solving most multimodal problems.
KW - neural network
KW - adaptive critic designs
KW - particle swarm optimisation
KW - back propagation
KW - dynamic programming
UR - http://www.scopus.com/inward/record.url?scp=84918783696&partnerID=8YFLogxK
U2 - 10.1080/09540091.2014.931355
DO - 10.1080/09540091.2014.931355
M3 - Article
SN - 0954-0091
VL - 26
SP - 367
EP - 388
JO - Connection Science
JF - Connection Science
IS - 4
ER -