logo

SCIENCE CHINA Information Sciences, Volume 62 , Issue 12 : 222201(2019) https://doi.org/10.1007/s11432-018-9865-9

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

More info
  • ReceivedDec 19, 2018
  • AcceptedMar 29, 2019
  • PublishedNov 12, 2019

Abstract


References

[1] Basar T, Olsder G J. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics). 2nd ed. Philadelphia: SIAM, 1999. Google Scholar

[2] Falugi P, Kountouriotis P A, Vinter R B. Differential Games Controllers That Confine a System to a Safe Region in the State Space, With Applications to Surge Tank Control. IEEE Trans Automat Contr, 2012, 57: 2778-2788 CrossRef Google Scholar

[3] Lin F H, Liu Q, Zhou X W. Towards green for relay in InterPlaNetary Internet based on differential game model. Sci China Inf Sci, 2014, 57-042306 CrossRef Google Scholar

[4] Luo B, Wu H N, Huang T W. Off-policy reinforcement learning for H control design.. IEEE Trans Cybern, 2015, 45: 65-76 CrossRef PubMed Google Scholar

[5] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998. Google Scholar

[6] Xia R S, Wu Q X, Chen M. Disturbance observer-based optimal longitudinal trajectory control of near space vehicle. Sci China Inf Sci, 2019, 62: 050212 CrossRef Google Scholar

[7] Wang D, Mu C X. Developing nonlinear adaptive optimal regulators through an improved neural learning mechanism. Sci China Inf Sci, 2017, 60: 058201 CrossRef Google Scholar

[8] Yan X H, Zhu J H, Kuang M C. Missile aerodynamic design using reinforcement learning and transfer learning. Sci China Inf Sci, 2018, 61: 119204 CrossRef Google Scholar

[9] Watkins C, Dayan P. Q-Learning. Mach Learn, 1992, 8: 279--292. Google Scholar

[10] Bradtke S J, Ydstie B E, Barto A G. Adaptive linear quadratic control using policy iteration. In: Proceedings of American Control Conference, Baltimore, 1994. 3475--3479. Google Scholar

[11] Chen C L, Dong D Y, Li H X. Hybrid MDP based integrated hierarchical Q-learning. Sci China Inf Sci, 2011, 54: 2279-2294 CrossRef Google Scholar

[12] Wei Q L, Liu D R. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Sci China Inf Sci, 2015, 58-122203 CrossRef Google Scholar

[13] Wei Q L, Lewis F L, Sun Q Y. Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis.. IEEE Trans Cybern, 2017, 47: 1224-1237 CrossRef PubMed Google Scholar

[14] Luo B, Liu D R, Huang T W. Model-Free Optimal Tracking Control via Critic-Only Q-Learning.. IEEE Trans Neural Netw Learning Syst, 2016, 27: 2134-2144 CrossRef PubMed Google Scholar

[15] Vamvoudakis K G. Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach. Syst Control Lett, 2017, 100: 14-20 CrossRef Google Scholar

[16] Vrabie D, Lewis F L. Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theor Appl, 2011, 9: 353-360 CrossRef Google Scholar

[17] Zhu Y H, Zhao D B, Li X G. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data.. IEEE Trans Neural Netw Learning Syst, 2017, 28: 714-725 CrossRef PubMed Google Scholar

[18] Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47: 1556-1569 CrossRef Google Scholar

[19] Zhang H G, Cui L L, Luo Y H. Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP.. IEEE Trans Cybern, 2013, 43: 206-216 CrossRef PubMed Google Scholar

[20] Liu D R, Li H L, Wang D. Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics. IEEE Trans Syst Man Cybern Syst, 2014, 44: 1015-1027 CrossRef Google Scholar

[21] Vamvoudakis K G. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. Automatica, 2015, 61: 274-281 CrossRef Google Scholar

[22] Zhao D B, Zhang Q C, Wang D. Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics.. IEEE Trans Cybern, 2016, 46: 854-865 CrossRef PubMed Google Scholar

[23] Song R Z, Lewis F L, Wei Q L. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games.. IEEE Trans Neural Netw Learning Syst, 2017, 28: 704-713 CrossRef PubMed Google Scholar

[24] Mehraeen S, Dierks T, Jagannathan S. Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks.. IEEE Trans Cybern, 2013, 43: 1641-1655 CrossRef PubMed Google Scholar

[25] Zhang H G, Jiang H, Luo C M. Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.. IEEE Trans Cybern, 2017, 47: 3331-3340 CrossRef PubMed Google Scholar

[26] Zhang H G, Jiang H, Luo Y H. Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method. IEEE Trans Ind Electron, 2017, 64: 4091-4100 CrossRef Google Scholar

[27] Kiumarsi B, Lewis F L, Jiang Z P. $H_\infty$ control of linear discrete-time systems: Off-policy reinforcement learning. Automatica, 2017, 78: 144-152 CrossRef Google Scholar

[28] Vamvoudakis K G, Modares H, Kiumarsi B. Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst, 2017, 37: 33-52 CrossRef Google Scholar

[29] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica, 2007, 43: 473-481 CrossRef Google Scholar

[30] Rizvi S A A, Lin Z L. Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control. Automatica, 2018, 95: 213-221 CrossRef Google Scholar

[31] Li J N, Chai T Y, Lewis F L. Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes. IEEE Trans Ind Electron, 2018, 65: 4092-4102 CrossRef Google Scholar

[32] Leake R J, Liu R W. Construction of Suboptimal Control Sequences. SIAM J Control, 1967, 5: 54-63 CrossRef Google Scholar

[33] loannou P, Fidan B. Adaptive Control Tutorial. Philadelphia: SIAM, 2006. Google Scholar