元强化学习在AUV多任务快速自适应控制的应用

公告通知

下载文档

联系方式

主管单位:: 中国船舶集团有限公司

主办单位:: 中国舰船研究院、中国船舶集团有限公司第七一四研究所

编辑出版:: 《舰船科学技术》编辑部

联系地址:: 北京市朝阳区科荟路55号院

邮编:: 100101

电话:: 陈老师：010-83027277
宋老师：010-83027276
李老师：010-83027269
梁老师：010-83027281

邮箱:: jckxjs@163.com

ISSN:: 1672-7649

CN:: 11-1885/U

友情链接

当前位置：首页 > 过刊浏览->2025年47卷5期

元强化学习在AUV多任务快速自适应控制的应用
Application of meta-reinforcement learning in AUV multi-task rapid adaptive control

DOI:

作者:: 徐春晖^1,2, 杨士霖^1,2,3, 徐德胜^1,2,3, 方田^1,2,3
XU Chunhui^1,2, YANG Shilin^1,2,3, XU Desheng^1,2,3, FANG Tian^1,2,3

作者单位:: 1. 中国科学院沈阳自动化研究所机器人学国家重点实验室，辽宁沈阳 110016;
2. 辽宁省水下机器人重点实验室，辽宁沈阳 110169;
3. 中国科学院大学，北京 100049
1. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
2. Key Laboratory of Marine Robotics, Liaoning Province, Shenyang 110169, China;
3. University of Chinese Academy of Sciences, Beijing 100049, China

关键词:: AUV;元强化学习;最优初始值模型参数;快速收敛
AUV; meta-reinforcement learning; optimal initial model parameters; fast convergence

摘要:: 为解决基于深度强化学习的AUV跟踪控制器在面临新任务时需从零开始训练、训练速度慢、稳定性差等问题，设计一种基于元强化学习的AUV多任务快速自适应控制算法——R-SAC（Reptile-Soft Actor Critic）算法。R-SAC算法将元学习与强化学习相结合，结合水下机器人运动学及动力学方程对跟踪任务进行建模，利用R-SAC算法在训练阶段为AUV跟踪控制器获得一组最优初始值模型参数，使模型在面临不同的任务时，基于该组参数进行训练时能够快速收敛，实现快速自适应不同任务。仿真结果表明，所提出的方法与随机初始化强化学习控制器相比，收敛速度最低提高了1.6倍，跟踪误差保持在2.8%以内。
To address the issue of AUV tracking controllers based on deep reinforcement learning requiring retraining from scratch for new tasks, exhibiting slow training speeds, and poor stability, a multi-task rapid adaptive control algorithm based on meta-reinforcement learning, termed R-SAC (Reptile-Soft Actor Critic), has been designed. The R-SAC algorithm integrates meta-learning with reinforcement learning and models the tracking tasks by combining underwater vehicle kinematic and dynamic equations. During the training phase, the R-SAC algorithm obtains a set of optimal initial model parameters for the AUV tracking controller, enabling the model to achieve fast convergence when facing different tasks and thus rapidly adapt to various tasks. Simulation results demonstrate that the proposed method improves convergence speed by at least 1.6 times compared to randomly initialized reinforcement learning controllers, with tracking errors maintained within 2.8%.

2025,47(5): 89-96 收稿日期：2024-8-5

DOI：10.3404/j.issn.1672-7649.2025.05.014

分类号：TP242.6

基金项目：国家重点研发计划项目（2022YFC2806000）

作者简介：徐春晖（1982 – ），男，硕士，副研究员，研究方向为水下机器人软件控制

参考文献：
[1] 高伟, 李天辰, 谷海涛, 等. 深海AUV无动力下潜运动特性研究[J]. 机器人, 2021, 43(6): 674-683.
[2] CORINA B, EDUARDO I, MATTHEW J R. Experimental evaluation of depth controllers for a small-size AUV[C]//IEEE/OES Autonomous Underwater Vehicle Workshop, Porto, Portugal, 2018.
[3] TAO L , YULI H , HUI X. Deep reinforcement learning for vectored thruster autonomous underwater vehicle control[J]. Complexity, 2021, 1-25.
[4] 许雅筑, 武辉, 游科友, 等. 强化学习方法在自主水下机器人控制任务中的应用[J]. 中国科学: 信息科学, 2020, 50(12): 1798-1816.
[5] LILLICRAP P T, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. CoRR, 2015, abs/1509.02971.
[6] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//MIT Press. MIT Press, 1999.
[7] 崔立, 宋玉, 张进. 基于自适应DDPG方法的复杂场景下AUV动动对接[J]. 船舶工程, 2023, 45(8): 8-14+69.
[8] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017.
[9] 徐春晖, 徐德胜, 周仕昊, 等. 基于上下文感知的强化学习AUV控制器研究[J]. 舰船科学技术, 2024, 46(11): 108-114.
[10] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018.
[11] BEHNAZ H, ALIREZA K, POURIA S. Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle[J]. Applied Ocean Research, 2022, 129.
[12] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018.
[13] YOANN S, GILLES C L, BENOIT C. Simultaneous control and guidance of an AUV based on soft actor–critic[J]. Sensors, 2022, 22(16): 6072-6072.
[14] BECK J, VUORIO R, LIU E Z, et al. A survey of meta-reinforcement learning[J]. arXiv preprint arXiv: 2301.08028, 2023.
[15] 李凡长, 刘洋, 吴鹏翔, 等. 元学习研究综述[J]. 计算机学报, 2021, 44(2): 422-446.
[16] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International conference on machine learning. PMLR, 2017.
[17] ARNDT K, HAZARA M, GHADIRZADEH A, et al. Meta reinforcement learning for sim-to-real domain adaptation[J]. CoRR, 2019, abs/1909. 12906.
[18] NICHOL A, ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms[J]. arXiv preprint arXiv: 1803.02999, 2018.
[19] FOSSEN T I. Marine control systems – guidance. navigation, and control of ships, rigs and underwater vehicles[J/OL]. Marine Cybernetics, Trondheim, Norway, Org. Number NO 985195 005 MVA, www. marinecybernetics. com, ISBN: 8292356 00 2, 2002[2024-06-06].
[20] PURCELL N. 6-DoF modelling and control of a remotely operated vehicle[EB/OL]//Bluerobotics. (2022-08-11)[2024-06-10]. https://BlueRo- botics.com/6-dof-modelling-and-control-of-a-remotely-operatedvehicle.
[21] MALTE B V, FOGH F S, ESBEN U, et al. An open-source benchmark simulator: control of a BlueROV2 underwater robot[J]. Journal of Marine Science and Engineering, 2022, 10(12): 1898-1898.
[22] LAMRAOUI C H, QIDAN Z. Path following control of fully-actuated autonomous underwater vehicle in presence of fast-varying disturbances[J]. Applied Ocean Research, 2019, 8(6): 40-46.

元强化学习在AUV多任务快速自适应控制的应用 Application of meta-reinforcement learning in AUV multi-task rapid adaptive control

元强化学习在AUV多任务快速自适应控制的应用
Application of meta-reinforcement learning in AUV multi-task rapid adaptive control