Cite this article：

This article has been：browse 173times download 89times	Scan the code！
倒立摆模糊确定性策略梯度控制方法研究
李霖翔,刘开南,班晓军,冯志超
0 font:enlarge+\|default\|narrow-
(哈尔滨工业大学控制理论与制导技术研究中心，哈尔滨 150001;火箭军工程大学导弹工程学院，西安 710025)

摘要:

倒立摆系统作为一类典型的非最小相位系统，具有显著的非线性和不稳定性特点，使其控制问题具有一定挑战性。针对传统基于深度强化学习的倒立摆控制方法中存在的神经网络可解释性不足、状态量难以收敛到期望值的问题，提出了一种基于确定性策略梯度的模糊强化学习(FDPG)控制算法。该算法将确定性策略梯度方法与T-S模糊模型相结合，利用T-S模糊模型良好的函数拟合能力，逼近Actor-Critic框架中的Actor结构，进而将控制策略用模糊规则直观地表达出来，使控制器的实际意义更加明确。同时，基于T-S模糊模型良好的可解释性优势，将线性二次型调节器(LQR)推导的最优控制律作为先验知识融入T-S模型中，保证了控制器局部稳定性。最后，通过与传统的深度确定性策略梯度(DDPG)算法以及模糊控制方法进行对比分析，验证了所提算法在倒立摆系统的控制中具有更好的控制效果与泛化能力。

关键词: 模糊强化学习模糊T-S模型倒立摆控制确定性策略梯度 DDPG算法

DOI：

基金项目:国家自然科学基金青年基金(62203461)

Research on fuzzy deterministic policy gradient control method for inverted pendulum system

LI Linxiang,LIU Kainan,BAN Xiaojun,FENG Zhichao

(Center for Control Theory and Guidance Technology, Harbin Institute of Technology, Harbin 150001, China;School of Missile Engineering, Rocket Force University of Engineering, Xi'an 710025, China)

Abstract:

As a typical non-minimum phase system, the inverted pendulum system exhibits significant nonlinear and unstable characteristics, making it challenging to control. In response to the problems of insufficient interpretability of neural networks and difficulty in converging state variables to expected values in traditional deep reinforcement learning-based control methods for the inverted pendulum, a fuzzy deterministic policy gradient (FDPG) control algorithm is proposed. This algorithm integrates the deterministic policy gradient method with a Takagi-Sugeno (T-S) fuzzy model, exploiting the excellent function approximation capabilities of the T-S fuzzy model to approximate the Actor structure within the Actor-Critic framework, thereby expressing control strategies intuitively through fuzzy rules and enhancing the practical significance of the controller. In addition, by exploiting the interpretability of the T-S fuzzy model, the optimal control law derived from the linear quadratic regulator (LQR) is incorporated into the T-S model as prior knowledge, which ensures the local stability of the controller. Finally, through comparative analysis with the traditional deep deterministic policy gradient (DDPG) algorithm and the piecewise fuzzy control method, the proposed algorithm is shown to offer superior control performance and generalization ability in controlling the inverted pendulum system.

Key words: Fuzzy reinforcement learning Fuzzy T-S model Inverted pendulum control Deterministic policy gradient Deep deterministic policy gradient (DDPG) algorithm

Sweep it with WeChat