logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 12 : 1798(2020) https://doi.org/10.1360/SSI-2020-0065

A selected review of reinforcement learning-based control for autonomous underwater vehicles

More info
  • ReceivedMar 19, 2020
  • AcceptedApr 23, 2020
  • PublishedOct 20, 2020

Abstract


Funded by

国家重点研发计划(2016YFC0300801)

国家自然科学基金(41576101,41427806)


References

[1] Li Z, You K, Song S. AUV Based Source Seeking with Estimated Gradients. J Syst Sci Complex, 2018, 31: 262-275 CrossRef Google Scholar

[2] Xiang X, Jouvencel B, Parodi O. Coordinated Formation Control of Multiple Autonomous Underwater Vehicles for Pipeline Inspection. Int J Adv Robotic Syst, 2010, 7: 3 CrossRef Google Scholar

[3] Ribas D, Palomeras N, Ridao P. Girona 500 AUV: From Survey to Intervention. IEEE/ASME Trans Mechatron, 2012, 17: 46-53 CrossRef Google Scholar

[4] Kiumarsi B, Vamvoudakis K G, Modares H. Optimal and Autonomous Control Using Reinforcement Learning: A Survey. IEEE Trans Neural Netw Learning Syst, 2018, 29: 2042-2062 CrossRef Google Scholar

[5] Kim H J, Jordan M I, Sastry S, et al. Autonomous helicopter flight via reinforcement learning. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, Vancouver, 2003. 799--806. Google Scholar

[6] Bagnell J A, Schneider J G. Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of IEEE International Conference on Robotics and Automation, Seoul, 2001. 1615--1620. Google Scholar

[7] Waslander S L, Hoffmann G M, Jang J S, et al. Multi-agent quadrotor testbed control design: Integral sliding mode vs. reinforcement learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, 2005. 3712--3717. Google Scholar

[8] Abbeel P, Coates A, Ng A Y. Autonomous Helicopter Aerobatics through Apprenticeship Learning. Int J Robotics Res, 2010, 29: 1608-1639 CrossRef Google Scholar

[9] Hester T, Quinlan M, Stone P. A real-time model-based reinforcement learning architecture for robot control. 2011,. arXiv Google Scholar

[10] Kendall A, Hawke J, Janz D, et al. Learning to drive in a day. In: Proceedings of International Conference on Robotics and Automation, Montreal, 2019. 8248--8254. Google Scholar

[11] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT press, 2018. Google Scholar

[12] Bertsekas D P. Dynamic Programming and Optimal Control. Nashua: Athena scientific Belmont, 1995. Google Scholar

[13] Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, 1999. 1057--1063. Google Scholar

[14] Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn, 1992, 8: 229-256 CrossRef Google Scholar

[15] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Proceedings of the 31th International Conference on Machine Learning, Beijing, 2014. 387--395. Google Scholar

[16] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015,. arXiv Google Scholar

[17] Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, 2018. 1582--1591. Google Scholar

[18] Amari S. Natural Gradient Works Efficiently in Learning. Neural Computation, 1998, 10: 251-276 CrossRef Google Scholar

[19] Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, 2015. 1889--1897. Google Scholar

[20] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. 2017,. arXiv Google Scholar

[21] Polydoros A S, Nalpantidis L. Survey of Model-Based Reinforcement Learning: Applications on Robotics. J Intell Robot Syst, 2017, 86: 153-173 CrossRef Google Scholar

[22] Anthony T, Tian Z, Barber D. Thinking fast and slow with deep learning and tree search. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Long Beach, 2017. 5360--5370. Google Scholar

[23] Racanière S, Weber T, Reichert D, et al. Imagination-augmented agents for deep reinforcement learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Long Beach, 2017. 5690--5701. Google Scholar

[24] Feinberg V, Wan A, Stoica I, et al. Model-based value estimation for efficient model-free reinforcement learning. 2018,. arXiv Google Scholar

[25] Khodayari M H, Balochian S. Modeling and control of autonomous underwater vehicle (AUV) in heading and depth attitude via self-adaptive fuzzy PID controller. J Mar Sci Technol, 2015, 20: 559-578 CrossRef Google Scholar

[26] Narasimhan M, Singh S N. Adaptive optimal control of an autonomous underwater vehicle in the dive plane using dorsal fins. Ocean Eng, 2006, 33: 404-416 CrossRef Google Scholar

[27] Lapierre L, Soetanto D. Nonlinear path-following control of an AUV. Ocean Eng, 2007, 34: 1734-1744 CrossRef Google Scholar

[28] Elmokadem T, Zribi M, Youcef-Toumi K. Trajectory tracking sliding mode control of underactuated AUVs. NOnlinear Dyn, 2016, 84: 1079-1091 CrossRef Google Scholar

[29] Antonelli G. Underwater Robots. 4th ed. Berlin: Springer, 2018. Google Scholar

[30] Kober J, Bagnell J A, Peters J. Reinforcement learning in robotics: A survey. Int J Robotics Res, 2013, 32: 1238-1274 CrossRef Google Scholar

[31] El-Fakdi A, Carreras M. Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robotics Autonomous Syst, 2013, 61: 271-282 CrossRef Google Scholar

[32] Ferreira F, Machado D, Ferri G, et al. Underwater optical and acoustic imaging: a time for fusion? a brief overview of the state-of-the-art. In: Proceedings of Oceans 2016 MTS/IEEE, Monterey, 2016. Google Scholar

[33] Lu H, Li Y, Zhang L. Contrast enhancement for images in turbid water. J Opt Soc Am A, 2015, 32: 886-893 CrossRef ADS Google Scholar

[34] Heidemann J, Stojanovic M, Zorzi M. Underwater sensor networks: applications, advances and challenges. Proc R Soc A, 2012, 370: 158-175 CrossRef ADS Google Scholar

[35] Yoo B, Kim J. Path optimization for marine vehicles in ocean currents using reinforcement learning. J Mar Sci Technol, 2016, 21: 334-343 CrossRef Google Scholar

[36] Wang C, Wei L, Wang Z. Reinforcement Learning-Based Multi-AUV Adaptive Trajectory Planning for Under-Ice Field Estimation. Sensors, 2018, 18: 3859 CrossRef Google Scholar

[37] Hu H, Song S, Chen C L P. Plume Tracing via Model-Free Reinforcement Learning Method. IEEE Trans Neural Netw Learning Syst, 2019, 30: 2515-2527 CrossRef Google Scholar

[38] Wu H, Song S, You K. Depth Control of Model-Free AUVs via Reinforcement Learning. IEEE Trans Syst Man Cybern Syst, 2019, 49: 2499-2510 CrossRef Google Scholar

[39] Carlucho I, De Paula M, Wang S. Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning. Robotics Autonomous Syst, 2018, 107: 71-86 CrossRef Google Scholar

[40] Carlucho I, de Paula M, Wang S, et al. AUV position tracking control using end-to-end deep reinforcement learning. In: Proceedings of Oceans 2018 MTS/IEEE, Charleston, 2018. Google Scholar

[41] Ahmadzadeh S R, Kormushev P, Caldwell D G. Multi-objective reinforcement learning for AUV thruster failure recovery. In: Proceedings of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Orlando, 2014. Google Scholar

[42] Stokey R P, Roup A, von Alt C, et al. Development of the remus 600 autonomous underwater vehicle. In: Proceedings of Oceans 2005 MTS/IEEE, Washington, 2005. 1301--1304. Google Scholar

[43] Fernandez-Gauna B, Osa J L, Gra na M. Effect of initial conditioning of reinforcement learning agents on feedback control tasks over continuous state and action spaces. In: Proceedings of International Joint Conference SOCO'14-CISIS'14-ICEUTE'14, Bilbao, 2014. 125--133. Google Scholar

[44] Walters P, Kamalapurkar R, Voight F. Online Approximate Optimal Station Keeping of a Marine Craft in the Presence of an Irrotational Current. IEEE Trans Robot, 2018, 34: 486-496 CrossRef Google Scholar

[45] Knudsen K B, Nielsen M C, Schjølberg I. Deep learning for station keeping of auvs. In: Proceedings of Oceans 2019 MTS/IEEE, Seattle, 2019. Google Scholar

[46] Frost G, Lane D M. Evaluation of Q-learning for search and inspect missions using underwater vehicles. In: Proceedings of Oceans 2014 MTS/IEEE, St. John's, 2014. Google Scholar

[47] Jamali N, Kormushev P, Ahmadzadeh S R, et al. Covariance analysis as a measure of policy robustness. In: Proceedings of Oceans 2014 MTS/IEEE, Taipei, 2014. Google Scholar

[48] Leonetti M, Ahmadzadeh S R, Kormushev P. On-line learning to recover from thruster failures on autonomous underwater vehicles. In: Proceedings of Oceans 2013 MTS/IEEE, San Diego, 2013. Google Scholar

[49] Ahmadzadeh S R, Leonetti M, Carrera A, et al. Online discovery of AUV control policies to overcome thruster failures. In: Proceedings of IEEE International Conference on Robotics and Automation, HongKong, 2014. 6522--6528. Google Scholar

[50] Palomeras N, El-Fakdi A, Carreras M. COLA2: A Control Architecture for AUVs. IEEE J Ocean Eng, 2012, 37: 695-716 CrossRef ADS Google Scholar

[51] Sun T T, He B, Nian R, et al. Target following for an autonomous underwater vehicle using regularized ELM-based reinforcement learning. In: Proceedings of Oceans 2015 MTS/IEEE, Washington, 2015. Google Scholar

[52] Shi W J, Song S J, Wu C. High-level tracking of autonomous underwater vehicles based on pseudo averaged Q-learning. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Miyazaki, 2018. 4138--4143. Google Scholar

[53] Shi W, Song S, Wu C. Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles. IEEE Trans Neural Netw Learning Syst, 2019, 30: 3534-3546 CrossRef Google Scholar

[54] Zhang Q, Lin J, Sha Q. Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle. IEEE Access, 2020, 8: 24258-24268 CrossRef Google Scholar

[55] Yu R S, Shi Z Y, Huang C X, et al. Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle. In: Proceedings of the 36th Chinese Control Conference, Dalian, 2017. 4958--4965. Google Scholar

[56] Cui R X, Yang C G, Li Y, et al. Neural network based reinforcement learning control of autonomous underwater vehicles with control input saturation. In: Proceedings of UKACC International Conference on Control, Loughborough, 2014. 50--55. Google Scholar

[57] Cui R, Yang C, Li Y. Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning. IEEE Trans Syst Man Cybern Syst, 2017, 47: 1019-1029 CrossRef Google Scholar

[58] Guo X, Yan W, Cui R. Integral Reinforcement Learning-Based Adaptive NN Control for Continuous-Time Nonlinear MIMO Systems With Unknown Control Directions. IEEE Trans Syst Man Cybern Syst, 2019, : 1-10 CrossRef Google Scholar

[59] Guo X, Yan W, Cui R. Event-Triggered Reinforcement Learning-Based Adaptive Tracking Control for Completely Unknown Continuous-Time Nonlinear Systems. IEEE Trans Cybern, 2020, 50: 3231-3242 CrossRef Google Scholar

[60] Lin L X, Xie H B, Shen L C. Application of reinforcement learning to autonomous heading control for bionic underwater robots. In: Proceedings of IEEE International Conference on Robotics and Biomimetics, Bangkok, 2009. 2486--2490. Google Scholar

[61] Lin L, Xie H, Zhang D. Supervised Neural Q_learning based Motion Control for Bionic Underwater Robots. J Bionic Eng, 2010, 7: S177-S184 CrossRef Google Scholar

[62] Wang J, Kim J. Optimization of fish-like locomotion using hierarchical reinforcement learning. In: Proceedings of the 12th International Conference on Ubiquitous Robots and Ambient Intelligence, Goyang, 2015. 465--469. Google Scholar

[63] Yan S Z, Wang J, Wu Z X, et al. Motion optimization for a robotic fish based on adversarial structured control. In: Proceedings of IEEE International Conference on Robotics and Biomimetics, Dali, 2019. 346--351. Google Scholar

[64] Prahacs C, Saudners A, Smith M K, et al. Towards legged amphibious mobile robotics. In: Proceedings of Canadian Design Engineering Network Conference, Montreal, 2004. Google Scholar

[65] Meger D, Higuera J C G, Xu A, et al. Learning legged swimming gaits from experience. In: Proceedings of IEEE International Conference on Robotics and Automation, Seattle, 2015. 2332--2338. Google Scholar

[66] Higuera J C G, Meger D, Dudek G. Synthesizing neural network controllers with probabilistic model-based reinforcement learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, 2018. 2538--2544. Google Scholar

[67] Zhang X L, Li B, Chang J, et al. Gliding control of underwater gliding snake-like robot based on reinforcement learning. In: Proceedings of the 8th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, Tianjin, 2018. 323--328. Google Scholar

[68] Zang W C, Nie Y L, Song D L, et al. Research on constraining strategies of underwater glider's trajectory under the influence of ocean currents based on DQN algorithm. In: Proceedings of Oceans 2019 MTS/IEEE, Seatle, 2019. Google Scholar

[69] Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay. 2015,. arXiv Google Scholar

[70] Wu H, Song S J, Hsu Y C, et al. End-to-end sensorimotor control problems of AUVs with deep reinforcement learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, 2019. 5869--5874. Google Scholar