引用本文
  •    [点击复制]
  •    [点击复制]
【打印本页】 【在线阅读全文】【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 434次   下载 359 本文二维码信息
码上扫一扫!
基于强化学习的多无人机避碰计算制导方法
赵毓,郭继峰,郑红星,白成超
0
(哈尔滨工业大学航天学院,哈尔滨 150001)
摘要:
针对大量固定翼无人机在有限空域内的协同避碰问题,提出了一种基于多智能体深度强化学习的计算制导方法。首先,将避碰制导过程抽象为序列决策问题,通过马尔可夫博弈理论对其进行数学描述。然后提出了一种基于深度神经网络技术的自主避碰制导决策方法,该网络使用改进的Actor-Critic模型进行训练,设计了实现该方法的机器学习架构,并给出了相关神经网络结构和机间协调机制。最后建立了一个实体数量可变的飞行场景模拟器,在其中进行“集中训练”和“分布执行”。为了验证算法的性能,在高航路密度场景中进行了仿真实验。仿真结果表明,提出的在线计算制导方法能够有效地降低多无人机在飞行过程中的碰撞概率,且对高航路密度场景具有很好的适应性。
关键词:  多智能体  强化学习  计算制导  固定翼  避碰
DOI:
基金项目:国家自然科学基金(61973101);航空科学基金(20180577005)
A Reinforcement Learning Based Computational Guidance Approach for UAVs Collision Avoidance
ZHAO Yu,GUO Ji-feng,ZHENG Hong-xing,BAI Cheng-chao
(School of Astronautics, Harbin Institute of Technology, Harbin 150001, China)
Abstract:
Aiming at the problem of cooperative collision avoidance for a large number of fixed wing UAVs in limited airspace, a computational guidance method based on multi-agent deep reinforcement learning is proposed. Firstly, the process of collision avoidance and guidance is formulated as a sequential decision problem, which is mathematically described by Markov game theory. Then, a decision-making method of autonomous collision avoidance guidance based on multilayer neural network technology is proposed. The network is trained by the improved Actor-Critic model. Furthermore, the machine learning architecture is designed to implement the method. The relevant neural network structure and coordination mechanisms among UAVs are given. Finally, a flight simulator with variable number of entities is established, in which centralized training and distributed execution are performed. In order to verify the performance of the algorithm, several simulation experiments are carried out in the scene of high traffic density. The simulation results show that the proposed onboard computational guidance method can effectively reduce the collision probability of multiple UAVs in flight process have a good adaptability to the scene of high route density.
Key words:  Multi-agent  Reinforcement learning  Computational guidance  Fixed wing  Collision avoidance

用微信扫一扫

用微信扫一扫