References
[1]
Basar T, Olsder G J. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics). 2nd ed. Philadelphia: SIAM, 1999.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Basar T, Olsder G J. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics). 2nd ed. Philadelphia: SIAM, 1999&
[2]
Falugi
P,
Kountouriotis
P A,
Vinter
R B.
Differential Games Controllers That Confine a System to a Safe Region in the State Space, With Applications to Surge Tank Control.
IEEE Trans Automat Contr,
2012, 57: 2778-2788
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Differential Games Controllers That Confine a System to a Safe Region in the State Space, With Applications to Surge Tank Control&author=Falugi P&author=Kountouriotis P A&author=Vinter R B&publication_year=2012&journal=IEEE Trans Automat Contr&volume=57&pages=2778-2788
[3]
Zha
W,
Chen
J,
Peng
Z.
Construction of Barrier in a Fishing Game With Point Capture..
IEEE Trans Cybern,
2017, 47: 1409-1422
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Construction of Barrier in a Fishing Game With Point Capture.&author=Zha W&author=Chen J&author=Peng Z&publication_year=2017&journal=IEEE Trans Cybern&volume=47&pages=1409-1422
[4]
Lin F H, Liu Q, Zhou X W, et al. Towards green for relay in InterPlaNetary Internet based on differential game model. Sci China Inf Sci, 2014, 57: 042306.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Lin F H, Liu Q, Zhou X W, et al. Towards green for relay in InterPlaNetary Internet based on differential game model. Sci China Inf Sci, 2014, 57: 042306&
[5]
Luo
B,
Wu
H N,
Huang
T.
Off-policy reinforcement learning for $H_\infty$ control design.
IEEE Trans Cybern,
2015, 45: 65-76
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Off-policy reinforcement learning for $H_\infty$ control design&author=Luo B&author=Wu H N&author=Huang T&publication_year=2015&journal=IEEE Trans Cybern&volume=45&pages=65-76
[6]
Bea
R W.
Successive Galerkin approximation algorithms for nonlinear optimal and robust control.
Int J Control,
1998, 71: 717-743
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Successive Galerkin approximation algorithms for nonlinear optimal and robust control&author=Bea R W&publication_year=1998&journal=Int J Control&volume=71&pages=717-743
[7]
Abu-Khalaf
M,
Lewis
F L,
Jie Huang
F L.
Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems.
IEEE Trans Neural Netw,
2008, 19: 1243-1252
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems&author=Abu-Khalaf M&author=Lewis F L&author=Jie Huang F L&publication_year=2008&journal=IEEE Trans Neural Netw&volume=19&pages=1243-1252
[8]
Freiling
G,
Jank
G,
Abou-Kandil
H.
On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games.
IEEE Trans Automat Contr,
1996, 41: 264-269
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games&author=Freiling G&author=Jank G&author=Abou-Kandil H&publication_year=1996&journal=IEEE Trans Automat Contr&volume=41&pages=264-269
[9]
Li T Y, Gajic Z. Lyapunov iterations for solving coupled algebraic riccati equations of nash differential games and algebraic riccati equations of zero-sum game. In: New Trends in Dynamic Games and Applications. Boston: Birkhäuser, 1995. 333--351.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Li T Y, Gajic Z. Lyapunov iterations for solving coupled algebraic riccati equations of nash differential games and algebraic riccati equations of zero-sum game. In: New Trends in Dynamic Games and Applications. Boston: Birkhäuser, 1995. 333--351&
[10]
Possieri C, Sassano M. An algebraic geometry approach for the computation of all linear feedback Nash equilibria in LQ differential games. In: Proceedings of the 54th IEEE Conference on Decision and Control, Osaka, 2015. 5197--5202.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Possieri C, Sassano M. An algebraic geometry approach for the computation of all linear feedback Nash equilibria in LQ differential games. In: Proceedings of the 54th IEEE Conference on Decision and Control, Osaka, 2015. 5197--5202&
[11]
Engwerda J C. LQ Dynamic Optimization and Differential Games. New York: Wiley, 2005.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Engwerda J C. LQ Dynamic Optimization and Differential Games. New York: Wiley, 2005&
[12]
Mylvaganam
T,
Sassano
M,
Astolfi
A.
Constructive $\epsilon$-Nash Equilibria for Nonzero-Sum Differential Games.
IEEE Trans Automat Contr,
2015, 60: 950-965
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Constructive $\epsilon$-Nash Equilibria for Nonzero-Sum Differential Games&author=Mylvaganam T&author=Sassano M&author=Astolfi A&publication_year=2015&journal=IEEE Trans Automat Contr&volume=60&pages=950-965
[13]
Sutton R S, Barto A G. Reinforcement Learning: an Introduction. Cambridge: MIT Press, 1998.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Sutton R S, Barto A G. Reinforcement Learning: an Introduction. Cambridge: MIT Press, 1998&
[14]
Werbos P J. Approximate dynamic programming for real-time control and neural modeling. In: Handbook of Intelligent Control. NEW York: Van Nostrand, 1992.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Werbos P J. Approximate dynamic programming for real-time control and neural modeling. In: Handbook of Intelligent Control. NEW York: Van Nostrand, 1992&
[15]
Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996&
[16]
Werbos P J. Elements of intelligence. Cybernetica, 1968, 11: 131.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Werbos P J. Elements of intelligence. Cybernetica, 1968, 11: 131&
[17]
Doya
K.
Reinforcement Learning in Continuous Time and Space.
Neural Computation,
2000, 12: 219-245
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Reinforcement Learning in Continuous Time and Space&author=Doya K&publication_year=2000&journal=Neural Computation&volume=12&pages=219-245
[18]
Wei Q L, Lewis F L, Sun Q Y, et al. Discrete-time deterministic Q-learning: a novel convergence analysis. IEEE Trans Cyber, 2016, 47: 1--14.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Wei Q L, Lewis F L, Sun Q Y, et al. Discrete-time deterministic Q-learning: a novel convergence analysis. IEEE Trans Cyber, 2016, 47: 1--14&
[19]
Wang
D,
Mu
C.
Developing nonlinear adaptive optimal regulators through an improved neural learning mechanism.
Sci China Inf Sci,
2017, 60: 058201
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Developing nonlinear adaptive optimal regulators through an improved neural learning mechanism&author=Wang D&author=Mu C&publication_year=2017&journal=Sci China Inf Sci&volume=60&pages=058201
[20]
Vrabie
D,
Pastravanu
O,
Abu-Khalaf
M.
Adaptive optimal control for continuous-time linear systems based on policy iteration.
Automatica,
2009, 45: 477-484
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Adaptive optimal control for continuous-time linear systems based on policy iteration&author=Vrabie D&author=Pastravanu O&author=Abu-Khalaf M&publication_year=2009&journal=Automatica&volume=45&pages=477-484
[21]
Jiang
Y,
Jiang
Z P.
Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics.
Automatica,
2012, 48: 2699-2704
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics&author=Jiang Y&author=Jiang Z P&publication_year=2012&journal=Automatica&volume=48&pages=2699-2704
[22]
Luo
B,
Wu
H N,
Huang
T.
Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design.
Automatica,
2014, 50: 3281-3290
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design&author=Luo B&author=Wu H N&author=Huang T&publication_year=2014&journal=Automatica&volume=50&pages=3281-3290
[23]
Zhang
H,
Wei
Q,
Liu
D.
An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games.
Automatica,
2011, 47: 207-214
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games&author=Zhang H&author=Wei Q&author=Liu D&publication_year=2011&journal=Automatica&volume=47&pages=207-214
[24]
Vrabie
D,
Lewis
F.
Adaptive dynamic programming for online solution of a zero-sum differential game.
J Control Theor Appl,
2011, 9: 353-360
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Adaptive dynamic programming for online solution of a zero-sum differential game&author=Vrabie D&author=Lewis F&publication_year=2011&journal=J Control Theor Appl&volume=9&pages=353-360
[25]
Zhu
Y,
Zhao
D,
Li
X.
Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data.
IEEE Trans Neural Netw Learning Syst,
2017, 28: 714-725
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data&author=Zhu Y&author=Zhao D&author=Li X&publication_year=2017&journal=IEEE Trans Neural Netw Learning Syst&volume=28&pages=714-725
[26]
Modares
H,
Lewis
F L,
Jiang
Z P.
H tracking control of completely unknown continuous-time systems via off-policy reinforcement learning..
IEEE Trans Neural Netw Learning Syst,
2015, 26: 2550-2562
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=H tracking control of completely unknown continuous-time systems via off-policy reinforcement learning.&author=Modares H&author=Lewis F L&author=Jiang Z P&publication_year=2015&journal=IEEE Trans Neural Netw Learning Syst&volume=26&pages=2550-2562
[27]
Kiumarsi
B,
Lewis
F L,
Jiang
Z P.
$H_\infty$ control of linear discrete-time systems: off-policy reinforcement learning.
Automatica,
2017, 78: 144-152
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=$H_\infty$ control of linear discrete-time systems: off-policy reinforcement learning&author=Kiumarsi B&author=Lewis F L&author=Jiang Z P&publication_year=2017&journal=Automatica&volume=78&pages=144-152
[28]
Vamvoudakis
K G,
Lewis
F L,
Hudas
G R.
Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality.
Automatica,
2012, 48: 1598-1611
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality&author=Vamvoudakis K G&author=Lewis F L&author=Hudas G R&publication_year=2012&journal=Automatica&volume=48&pages=1598-1611
[29]
Huaguang Zhang
,
Lili Cui
,
Yanhong Luo
.
Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP..
IEEE Trans Cybern,
2013, 43: 206-216
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP.&author=Huaguang Zhang &author=Lili Cui &author=Yanhong Luo &publication_year=2013&journal=IEEE Trans Cybern&volume=43&pages=206-216
[30]
Zhang
H,
Jiang
H,
Luo
C.
Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms..
IEEE Trans Cybern,
2017, 47: 3331-3340
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.&author=Zhang H&author=Jiang H&author=Luo C&publication_year=2017&journal=IEEE Trans Cybern&volume=47&pages=3331-3340
[31]
Vamvoudakis
K G.
Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems.
Automatica,
2015, 61: 274-281
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems&author=Vamvoudakis K G&publication_year=2015&journal=Automatica&volume=61&pages=274-281
[32]
Zhao
D,
Zhang
Q,
Wang
D.
Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics..
IEEE Trans Cybern,
2016, 46: 854-865
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics.&author=Zhao D&author=Zhang Q&author=Wang D&publication_year=2016&journal=IEEE Trans Cybern&volume=46&pages=854-865
[33]
Johnson
M,
Kamalapurkar
R,
Bhasin
S.
Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System..
IEEE Trans Neural Netw Learning Syst,
2015, 26: 1645-1658
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System.&author=Johnson M&author=Kamalapurkar R&author=Bhasin S&publication_year=2015&journal=IEEE Trans Neural Netw Learning Syst&volume=26&pages=1645-1658
[34]
Liu
D,
Li
H,
Wang
D.
Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics.
IEEE Trans Syst Man Cybern Syst,
2014, 44: 1015-1027
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics&author=Liu D&author=Li H&author=Wang D&publication_year=2014&journal=IEEE Trans Syst Man Cybern Syst&volume=44&pages=1015-1027
[35]
Song
R,
Lewis
F L,
Wei
Q.
Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games..
IEEE Trans Neural Netw Learning Syst,
2017, 28: 704-713
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games.&author=Song R&author=Lewis F L&author=Wei Q&publication_year=2017&journal=IEEE Trans Neural Netw Learning Syst&volume=28&pages=704-713
[36]
Vrabie D, Lewis F L. Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proceedings of the 49th IEEE Conference on Decision and Control, Atlanta, 2010: 3066--3071.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Vrabie D, Lewis F L. Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proceedings of the 49th IEEE Conference on Decision and Control, Atlanta, 2010: 3066--3071&
[37]
Vamvoudakis
K G,
Modares
H,
Kiumarsi
B.
Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online.
IEEE Control Syst,
2017, 37: 33-52
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online&author=Vamvoudakis K G&author=Modares H&author=Kiumarsi B&publication_year=2017&journal=IEEE Control Syst&volume=37&pages=33-52
[38]
Leake
R J,
Liu
R W.
Construction of Suboptimal Control Sequences.
SIAM J Control,
1967, 5: 54-63
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Construction of Suboptimal Control Sequences&author=Leake R J&author=Liu R W&publication_year=1967&journal=SIAM J Control&volume=5&pages=54-63
[39]
Vamvoudakis
K G,
Lewis
F L.
Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations.
Automatica,
2011, 47: 1556-1569
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations&author=Vamvoudakis K G&author=Lewis F L&publication_year=2011&journal=Automatica&volume=47&pages=1556-1569
[40]
Watkins C, Dayan P. Q-Learning. Mach Learn, 1992, 8: 279--292.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Watkins C, Dayan P. Q-Learning. Mach Learn, 1992, 8: 279--292&
[41]
Bradtke S J, Ydstie B E, Barto A G. Adaptive linear quadratic control using policy iteration. In: Proceedings of American Control Conference, Baltimore, 1994. 3475--3479.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Bradtke S J, Ydstie B E, Barto A G. Adaptive linear quadratic control using policy iteration. In: Proceedings of American Control Conference, Baltimore, 1994. 3475--3479&
[42]
Chen
C L,
Dong
D Y,
Li
H X.
Hybrid MDP based integrated hierarchical Q-learning.
Sci China Inf Sci,
2011, 54: 2279-2294
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Hybrid MDP based integrated hierarchical Q-learning&author=Chen C L&author=Dong D Y&author=Li H X&publication_year=2011&journal=Sci China Inf Sci&volume=54&pages=2279-2294
[43]
Wei Q L, Liu D R. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Sci China Inf Sci, 2015, 58: 122203.
Google Scholar
http://scholar.google.com/scholar_lookup?title=Wei Q L, Liu D R. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Sci China Inf Sci, 2015, 58: 122203&
[44]
Palanisamy
M,
Modares
H,
Lewis
F L.
Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems..
IEEE Trans Cybern,
2015, 45: 165-176
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems.&author=Palanisamy M&author=Modares H&author=Lewis F L&publication_year=2015&journal=IEEE Trans Cybern&volume=45&pages=165-176
[45]
Yan
P,
Wang
D,
Li
H.
Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration.
IEEE Trans Syst Man Cybern Syst,
2017, 47: 1207-1216
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration&author=Yan P&author=Wang D&author=Li H&publication_year=2017&journal=IEEE Trans Syst Man Cybern Syst&volume=47&pages=1207-1216
[46]
Luo
B,
Liu
D,
Wu
H N.
Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control..
IEEE Trans Cybern,
2017, 47: 3341-3354
CrossRef
PubMed
Google Scholar
http://scholar.google.com/scholar_lookup?title=Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control.&author=Luo B&author=Liu D&author=Wu H N&publication_year=2017&journal=IEEE Trans Cybern&volume=47&pages=3341-3354
[47]
Vamvoudakis
K G.
Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach.
Syst Control Lett,
2017, 100: 14-20
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach&author=Vamvoudakis K G&publication_year=2017&journal=Syst Control Lett&volume=100&pages=14-20
[48]
Vamvoudakis
K G,
Hespanha
J P.
Cooperative Q-Learning for Rejection of Persistent Adversarial Inputs in Networked Linear Quadratic Systems.
IEEE Trans Automat Contr,
2018, 63: 1018-1031
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Cooperative Q-Learning for Rejection of Persistent Adversarial Inputs in Networked Linear Quadratic Systems&author=Vamvoudakis K G&author=Hespanha J P&publication_year=2018&journal=IEEE Trans Automat Contr&volume=63&pages=1018-1031
[49]
Rizvi
S A A,
Lin
Z.
Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control.
Automatica,
2018, 95: 213-221
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control&author=Rizvi S A A&author=Lin Z&publication_year=2018&journal=Automatica&volume=95&pages=213-221
[50]
Li
J,
Chai
T,
Lewis
F L.
Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes.
IEEE Trans Ind Electron,
2018, 65: 4092-4102
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes&author=Li J&author=Chai T&author=Lewis F L&publication_year=2018&journal=IEEE Trans Ind Electron&volume=65&pages=4092-4102
[51]
Kleinman
D.
On an iterative technique for Riccati equation computations.
IEEE Trans Automat Contr,
1968, 13: 114-115
CrossRef
Google Scholar
http://scholar.google.com/scholar_lookup?title=On an iterative technique for Riccati equation computations&author=Kleinman D&publication_year=1968&journal=IEEE Trans Automat Contr&volume=13&pages=114-115