为解决基于深度强化学习的AUV跟踪控制器在面临新任务时需从零开始训练、训练速度慢、稳定性差等问题,设计一种基于元强化学习的AUV多任务快速自适应控制算法——R-SAC(Reptile-Soft Actor Critic)算法。R-SAC算法将元学习与强化学习相结合,结合水下机器人运动学及动力学方程对跟踪任务进行建模,利用R-SAC算法在训练阶段为AUV跟踪控制器获得一组最优初始值模型参数,使模型在面临不同的任务时,基于该组参数进行训练时能够快速收敛,实现快速自适应不同任务。仿真结果表明,所提出的方法与随机初始化强化学习控制器相比,收敛速度最低提高了1.6倍,跟踪误差保持在2.8%以内。
To address the issue of AUV tracking controllers based on deep reinforcement learning requiring retraining from scratch for new tasks, exhibiting slow training speeds, and poor stability, a multi-task rapid adaptive control algorithm based on meta-reinforcement learning, termed R-SAC (Reptile-Soft Actor Critic), has been designed. The R-SAC algorithm integrates meta-learning with reinforcement learning and models the tracking tasks by combining underwater vehicle kinematic and dynamic equations. During the training phase, the R-SAC algorithm obtains a set of optimal initial model parameters for the AUV tracking controller, enabling the model to achieve fast convergence when facing different tasks and thus rapidly adapt to various tasks. Simulation results demonstrate that the proposed method improves convergence speed by at least 1.6 times compared to randomly initialized reinforcement learning controllers, with tracking errors maintained within 2.8%.
2025,47(5): 89-96 收稿日期:2024-8-5
DOI:10.3404/j.issn.1672-7649.2025.05.014
分类号:TP242.6
基金项目:国家重点研发计划项目(2022YFC2806000)
作者简介:徐春晖(1982 – ),男,硕士,副研究员,研究方向为水下机器人软件控制
参考文献:
[1] 高伟, 李天辰, 谷海涛, 等. 深海AUV无动力下潜运动特性研究[J]. 机器人, 2021, 43(6): 674-683.
[2] CORINA B, EDUARDO I, MATTHEW J R. Experimental evaluation of depth controllers for a small-size AUV[C]//IEEE/OES Autonomous Underwater Vehicle Workshop, Porto, Portugal, 2018.
[3] TAO L , YULI H , HUI X. Deep reinforcement learning for vectored thruster autonomous underwater vehicle control[J]. Complexity, 2021, 1-25.
[4] 许雅筑, 武辉, 游科友, 等. 强化学习方法在自主水下机器人控制任务中的应用[J]. 中国科学: 信息科学, 2020, 50(12): 1798-1816.
[5] LILLICRAP P T, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. CoRR, 2015, abs/1509.02971.
[6] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//MIT Press. MIT Press, 1999.
[7] 崔立, 宋玉, 张进. 基于自适应DDPG方法的复杂场景下AUV动动对接[J]. 船舶工程, 2023, 45(8): 8-14+69.
[8] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017.
[9] 徐春晖, 徐德胜, 周仕昊, 等. 基于上下文感知的强化学习AUV控制器研究[J]. 舰船科学技术, 2024, 46(11): 108-114.
[10] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018.
[11] BEHNAZ H, ALIREZA K, POURIA S. Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle[J]. Applied Ocean Research, 2022, 129.
[12] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018.
[13] YOANN S, GILLES C L, BENOIT C. Simultaneous control and guidance of an AUV based on soft actor–critic[J]. Sensors, 2022, 22(16): 6072-6072.
[14] BECK J, VUORIO R, LIU E Z, et al. A survey of meta-reinforcement learning[J]. arXiv preprint arXiv: 2301.08028, 2023.
[15] 李凡长, 刘洋, 吴鹏翔, 等. 元学习研究综述[J]. 计算机学报, 2021, 44(2): 422-446.
[16] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International conference on machine learning. PMLR, 2017.
[17] ARNDT K, HAZARA M, GHADIRZADEH A, et al. Meta reinforcement learning for sim-to-real domain adaptation[J]. CoRR, 2019, abs/1909. 12906.
[18] NICHOL A, ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms[J]. arXiv preprint arXiv: 1803.02999, 2018.
[19] FOSSEN T I. Marine control systems – guidance. navigation, and control of ships, rigs and underwater vehicles[J/OL]. Marine Cybernetics, Trondheim, Norway, Org. Number NO 985195 005 MVA, www. marinecybernetics. com, ISBN: 8292356 00 2, 2002[2024-06-06].
[20] PURCELL N. 6-DoF modelling and control of a remotely operated vehicle[EB/OL]//Bluerobotics. (2022-08-11)[2024-06-10]. https://BlueRo- botics.com/6-dof-modelling-and-control-of-a-remotely-operatedvehicle.
[21] MALTE B V, FOGH F S, ESBEN U, et al. An open-source benchmark simulator: control of a BlueROV2 underwater robot[J]. Journal of Marine Science and Engineering, 2022, 10(12): 1898-1898.
[22] LAMRAOUI C H, QIDAN Z. Path following control of fully-actuated autonomous underwater vehicle in presence of fast-varying disturbances[J]. Applied Ocean Research, 2019, 8(6): 40-46.