Actor-Critic reinforcement learning based on prior knowledge
Qilu University of Technology Jinan, China
In order to improve the incremental learning algorithm Actor-Critic learning efficiency, from a policy learning, introduce experience sample data into incremental Actor-Critic algorithm, make effective use of the useful information contained in the sample data of experience in the learning process. Given the recursive least-squares temporal difference, RLSTD (λ) algorithm and incremental least-squares temporal difference, iLSTD (λ) algorithms are able to make good use sample data collected in the past, respectively RLSTD and iLSTD algorithm is applied to policy evaluation Critic’s. Then, Critic learned value function based on RLSTD or iLSTD algorithm, Actor gradient update strategy based on conventional parameters, so the improvement of Critic effectiveness assessment will help Actor to improve strategy-learning performance. Finally, simulation studies on two control problems with continuous state space, analyse the impact of different parameters on the performance of the learning algorithm and verify its effectiveness.