logo

SCIENTIA SINICA Informationis, Volume 48 , Issue 10 : 1430-1449(2018) https://doi.org/10.1360/N112018-00072

Smart generation control based on deep reinforcement learning with the ability of action self-optimization

More info
  • ReceivedMay 11, 2018
  • AcceptedJun 11, 2018
  • PublishedOct 16, 2018

Abstract


Funded by

国家自然科学基金(51707102)

国家自然科学基金(61603212)


Supplement

Appendix

续表A1


References

[1] Lund H. Large-scale integration of wind power into different energy systems. Energy, 2005, 30: 2402-2412 CrossRef Google Scholar

[2] Soares M.C. Borba B, Szklo A, Schaeffer R. Plug-in hybrid electric vehicles as a way to maximize the integration of variable renewable energy in power systems: The case of wind generation in northeastern Brazil. Energy, 2012, 37: 469-481 CrossRef Google Scholar

[3] Venkat A N, Hiskens I A, Rawlings J B. Distributed MPC Strategies With Application to Power System Automatic Generation Control. IEEE Trans Contr Syst Technol, 2008, 16: 1192-1206 CrossRef Google Scholar

[4] Mallesham G, Mishra S, Jha A N. Maiden application of Ziegler-Nichols method to AGC of distributed generation system. In: Proceedings of IEEE/PES Power Systems Conference and Exposition, Seattle, 2009. 1--7. Google Scholar

[5] Yazdanian M, Mehrizi-Sani A. Distributed Control Techniques in Microgrids. IEEE Trans Smart Grid, 2014, 5: 2901-2909 CrossRef Google Scholar

[6] Busoniu L, Babuska R, De Schutter B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans Syst Man Cybern C, 2008, 38: 156-172 CrossRef Google Scholar

[7] Yu T, Zhou B, Chan K W. Stochastic Optimal CPS Relaxed Control Methodology for Interconnected Power Systems Using Q-Learning Method. J Energy Eng, 2011, 137: 116-129 CrossRef Google Scholar

[8] Yu T, Zhou B, Chan K W. Stochastic Optimal Relaxed Automatic Generation Control in Non-Markov Environment Based on Multi-Step Q(lambda) Learning. IEEE Trans Power Syst, 2011, 26: 1272-1282 CrossRef ADS Google Scholar

[9] Yu T, Xi L, Yang B. Multiagent Stochastic Dynamic Game for Smart Generation Control. J Energy Eng, 2016, 142: 04015012 CrossRef Google Scholar

[10] Xi L, Yu T, Yang B. A novel multi-agent decentralized win or learn fast policy hill-climbing with eligibility trace algorithm for smart generation control of interconnected complex power grids. Energy Convers Manage, 2015, 103: 82-93 CrossRef Google Scholar

[11] Foerster J N, Assael Y M, Freitas N D, et al. Learning to communicate to solve riddles with deep distributed decurrent Q-networks. 2016,. arXiv Google Scholar

[12] Banerjee B, Kraemer L. Reinforcement learning with action discovery. In: Proceedings of the Adaptive and Learning Agents Workshop at AAMAS-10, Toronto, 2010. 30--37. Google Scholar

[13] Tan W, Xu Z. Robust analysis and design of load frequency controller for power systems. Electric Power Syst Res, 2009, 79: 846-853 CrossRef Google Scholar

[14] Zhang X, Zheng L, Yu T. Multi-objective optimal carbon emission flow calculation of power grid based on multi-step Q($\lambda~)$ learning algorithm. Automat Electron Power Sys, 2014, 38: 118--123. Google Scholar

[15] Tang Y, Zhang W, Zhang J, et al. Research on control performance standard based control strategy for AGC. Power Sys Te Chno, 2004, 28: 75--79. Google Scholar

[16] Park J, Law K H. A data-driven, cooperative wind farm control to maximize the total power production. Appl Energy, 2016, 165: 151-165 CrossRef Google Scholar

[17] Banerjee B, Peng J. Adaptive policy gradient in multiagent learning. In: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems. New York: ACM, 2003. 686--692. Google Scholar

[18] Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002, 136: 215-250 CrossRef Google Scholar

[19] Yu T, Zhou B, Chan K W. R() imitation learning for automatic generation control of interconnected power grids. Automatica, 2012, 48: 2130-2136 CrossRef Google Scholar

  • Figure 1

    (Color online) The schematic diagram of action discovery

  • Figure 2

    (Color online) Control schematic diagram of SGC system based on DDRQN-AD

  • Figure 3

    (Color online) Active power of wind power, photovoltaic and electricvehicles

  • Figure 4

    (Color online) The two-area micro-grid LFC power system model

  • Figure 5

    (Color online) The pre-learning effect of DDRQN-AD in area-A

  • Figure 6

    (Color online) Control performance of different algorithms under pulsedisturbance. (a) Output pulse of various algorithms; (b) the $\vert~\Delta~f\vert$ and $\vert$ACE$\vert~$ of various algorithms

  • Figure 7

    (Color online) Control performance of different algorithms under random pulsedisturbance. (a) Output results under random pulse disturbance; (b)the $\vert~\Delta~f\vert$ and $\vert~$ACE$\vert~$ ofvarious algorithms

  • Figure 8

    (Color online) The adjusted active power of wind power, photovoltaic andelectric vehicles

  • Figure 9

    (Color online) Effect of DDRQN-AD under white noise disturbance

  • Figure 10

    (Color online) Performance statistics of the three algorithms under whitenoise disturbance

  • Figure 11

    (Color online) Guangdong power grid model

  • Table 1   SGC parameter settings
    Case $\delta$ $\alpha$ $\alpha^{-}$ $\gamma$
    Ideal environment 0.1 0.5 0.5 0.95
    Nonideal environment 0.3 0.1 0.1 0.9
  •   

    Algorithm 1 DDRQN-AD算法

    Require:对所有智能体$m$, 初始化奖励函数$R$(1), 动作集$A$(1), 权值$\theta_{1}$以及$\theta^-_1.$

    Output:设置算法参数$\delta$, $\gamma$, $\alpha$, $\alpha^-$.

    Output:设置初始状态$s_{1}$, 设初始内部状态$h_{1}$=0, 设$\nabla~\theta=0.$

    Output:Start

    基于动作概率分布选择并执行一个探索动作$a^m_t$.

    观察下一时刻的状态$s_{t~+~1}$.

    记录状态观测值$o^m_{t+1}$和内部状态$h^m_t$.

    由式(8)获取一个短期的奖励函数信号$R(t)$.

    根据式(6)计算目标Q值函数$y^m_t$.

    按照式(1)计算损失函数误差$L^m_t$.

    根据式(3)和(4)更新权值$\theta_{i~+~1}$和$\theta^-_{i+1}$.

    按照式(7)搜索并评估新动作.

    动作集$A(t)$更新为$A(t$+1).

    令$t=t$+1, 返回步骤1.

    Output:End

  • Table 1   Unit parameter statistics of Guangdong power grid
    Area Unit type Unit $\Delta~P_k^{\rm~max}$ (MW) $\Delta~P_k^{\rm~min}$ (MW) $B_{k}$ (kg/kWh) Unit state (Summer) Unit state (Winter)
    Yuebei Coal-fired units G1 120 $-120$ 0.99 Starting up Maintenance
    G2 120 $-$120 0.99 Starting up Starting up
    G3 120 $-$120 0.99 Cosing down Starting up
    G4 135 $-$135 0.99 Starting up Starting up
    G5 135 $-$135 0.99 Starting up Starting up
    G6 300 $-$300 0.99 Starting up Starting up
    G7 300 $-$300 0.99 Starting up Starting up
    G8 320 $-$320 0.89 Starting up Starting up
    Gas-fired unit G9 188 $-$188 0.5 Starting up Starting up
    Hydropower unit G10 180 0 0 Starting up 50% capacity
    Yuexi Coal-fired units G11 500 $-$500 0.89 Starting up Starting up
    G12 330 $-$330 0.89 Starting up Starting up
    G13 125 $-$125 0.99 Starting up Maintenance
    G14 125 $-$125 0.99 Cosing down Starting up
    G15 150 $-$150 0.99 Starting up Starting up
    G16 150 $-$150 0.99 Starting up Starting up
    G17 150 $-$150 0.99 Starting up Starting up
    G18 220 $-$220 0.99 Starting up Starting up
    G19 220 $-$220 0.99 Starting up Starting up
    G20 220 $-$220 0.99 Starting up Starting up
    G21 660 $-$660 0.87 Starting up Starting up
    G22 180 $-$180 0.99 Starting up Starting up
    G23 180 $-$180 0.99 Starting up Starting up
    Gas-fired units G24 280 $-$280 0.5 Starting up Starting up
    G25 200 $-$200 0.5 Starting up Starting up
    G26 200 $-$200 0.5 Starting up Starting up
    G27 200 $-$200 0.5 Starting up Starting up
    Oil-fuel units G28 120 $-$120 0.7 Starting up Maintenance
    G29 120 $-$120 0.7 Cosing down Starting up
    Zhusanjiao Coal-fired units G30 600 $-$600 0.89 Starting up Starting up
    G31 100 $-$100 0.99 Starting up Maintenance
    G32 100 $-$100 0.99 Starting up Maintenance
    G33 200 $-$200 0.99 Starting up Starting up
    G34 200 $-$200 0.99 Starting up Starting up
    G35 200 $-$200 0.99 Starting up Starting up
    G36 210 $-$210 0.99 Starting up Starting up
    G37 240 $-$240 0.99 Starting up Starting up
  •   
    Area Unit type Unit $\Delta~P_k^{\rm~max}~$ (MW) $\Delta~P_k^{\rm~min}$ (MW) $B_{k}$ (kg/kWh) Unit state (Summer) Unit state (Winter)
    G38 240 $-$240 0.99 Starting up Starting up
    G39 280 $-$280 0.99 Starting up Starting up
    G40 280 $-$280 0.99 Starting up Starting up
    G41 280 $-$280 0.99 Starting up Starting up
    G42 250 $-$250 0.99 Starting up Starting up
    G43 250 $-$250 0.99 Starting up Starting up
    G44 360 $-$360 0.89 Starting up Starting up
    G45 360 $-$360 0.89 Starting up Starting up
    G46 400 $-$400 0.89 Starting up Starting up
    G47 400 $-$400 0.89 Starting up Starting up
    Gas-fired units G48 180 $-$180 0.5 Starting up Starting up
    G49 180 $-$180 0.5 Starting up Starting up
    G50 180 $-$180 0.5 Starting up Starting up
    Oil-fuel units G51 150 $-$150 0.7 Cosing down Starting up
    G52 150 $-$150 0.7 Cosing down Starting up
    G53 180 $-$180 0.7 Starting up Starting up
    G54 180 $-$180 0.7 Starting up Starting up
    G55 180 $-$180 0.7 Starting up Starting up
    Hydropower units G56 300 0 0 Starting up 50% capacity
    G57 300 0 0 Starting up 50% capacity
    G58 400 0 0 Starting up 50% capacity
    Yuedong Coal-fired units G59 100 $-$100 0.99 Starting up Maintenance
    G60 196 $-$196 0.99 Starting up Starting up
    G61 296 $-$296 0.99 Starting up Starting up
    G62 180 $-$180 0.99 Cosing down Starting up
    G63 220 $-$220 0.99 Starting up Starting up
    G64 180 $-$180 0.99 Cosing down Starting up
    G65 220 $-$220 0.99 Starting up Starting up
    G66 180 $-$180 0.99 Starting up Starting up
    G67 100 $-$100 0.99 Starting up Maintenance
    G68 168 $-$168 0.99 Cosing down Starting up
    G69 60 $-$60 0.99 Starting up Maintenance
    G70 210 $-$210 0.99 Starting up Starting up
    G71 350 $-$350 0.89 Starting up Starting up
    G72 240 $-$240 0.99 Starting up Starting up
    G73 240 $-$240 0.99 Starting up Starting up
    G74 240 $-$240 0.99 Starting up Starting up
    G75 240 $-$240 0.99 Starting up Starting up
    G76 200 $-$200 0.99 Starting up Starting up
  • Table 2   Control performance of different algorithms under stepdisturbance
    Algorithm Overshoot (% Steady state error (% Risetime (s)
    DDRQN-AD 7.08 0.57 138
    DDRQN 7.34 5.83 202
    PDWoLF-PHC($\lambda~)$ 7.38 3.98 190
    SARSA-AD 7.38 7.24 186
    DWoLF-PHC($\lambda~)$ 7.54 7.57 318
    WoLF-PHC 7.40 12.42 198
    R($\lambda~)$ 7.36 20.84 222
    Q($\lambda~)$ 8.24 12.28 254
    Q 8.13 13.69 534
  •   
    Area Unit type Unit $\Delta~P_k^{\rm~max}$ (MW) $\Delta~P_k^{\rm~min}$ (MW) $B_{k}$ (kg/kWh) Unit state (Summer) Unit state (Winter)
    G77 200 $-$200 0.99 Starting up Starting up
    G78 220 $-$220 0.99 Starting up Starting up
    G79 220 $-$220 0.99 Starting up Starting up
    G80 220 $-$220 0.99 Starting up Starting up
    G81 350 $-$350 0.89 Starting up Starting up
    G82 350 $-$350 0.89 Starting up Starting up
    Gas-fired units G83 250 $-$250 0.5 Starting up Starting up
    G84 250 $-$250 0.5 Starting up Starting up
    G85 250 $-$250 0.5 Starting up Starting up
    G86 250 $-$250 0.5 Starting up Starting up
    G87 288 $-$288 0.5 Starting up Starting up
    G88 360 $-$360 0.5 Starting up Starting up
    G89 100 $-$100 0.5 Starting up Maintenance
    Oil-fuel units G90 240 $-$240 0.7 Starting up Starting up
    G91 240 $-$240 0.7 Starting up Starting up
    G92 120 $-$120 0.7 Cosing down Starting up
    Hydropower unit G93 244 0 0 Starting up 50% capacity
  • Table 3   Statistics of Guangdong power grid in summer under differentalgorithms
    Area Algorithm $\vert~$ACE$\vert~$ (MW) CPS1 (% $\vert~\Delta~f\vert~$ (Hz) CE
    DDRQN-AD 4.8465 199.9606 0.0036 637.8061
    DDRQN 13.2599 199.5169 0.0066 653.1144
    Yuebei PDWoLF-PHC($\lambda~)$ 30.8923 197.6574 0.0096 687.4272
    DWoLF-PHC($\lambda~)$ 62.0124 194.6082 0.0140 689.4484
    Q($\lambda~)$ 82.0249 188.5264 0.0154 694.0980
    DDRQN-AD 9.3702 199.9714 0.0063 671.2834
    DDRQN 19.5084 198.1847 0.0096 688.7363
    Yuexi PDWoLF-PHC($\lambda~)$ 45.5341 195.3937 0.0122 692.6057
    DWoLF-PHC($\lambda~)$ 72.5696 189.6666 0.0141 693.1040
    Q($\lambda~)$ 105.0339 178.8297 0.0155 699.8637
    DDRQN-AD 9.3545 199.5875 0.0054 633.6197
    DDRQN 18.1722 198.8693 0.0068 652.1616
    Zhusanjiao PDWoLF-PHC($\lambda~)$ 45.7089 195.0776 0.0098 683.5096
    DWoLF-PHC($\lambda~)$ 80.8745 191.5694 0.0142 687.8286
    Q($\lambda~)$ 139.9966 173.0605 0.0157 694.8414
    DDRQN-AD 2.9283 199.8459 0.0065 635.2505
    DDRQN 9.7626 199.1897 0.0069 657.7666
    Yuedong PDWoLF-PHC($\lambda~)$ 22.7016 197.2192 0.0100 671.7122
    DWoLF-PHC($\lambda~)$ 61.9700 194.6535 0.0144 675.5289
    Q($\lambda~)$ 102.4672 190.5930 0.0157 698.7767
  • Table 4   Statistics of Guangdong power grid in winter under differentalgorithms
    Area Algorithm $\vert~$ACE$\vert~$ (MW) CPS1 (% $\vert~\Delta~f\vert~$ (Hz) CE
    DDRQN-AD 4.8690 199.8558 0.0030 704.9624
    DDRQN 11.5191 198.6730 0.0050 719.7761
    Yuebei PDWoLF-PHC($\lambda~)$ 38.7529 195.3053 0.0101 721.6102
    DWoLF-PHC($\lambda~)$ 70.2284 194.1612 0.0110 723.7792
    Q($\lambda~)$ 135.8834 190.3734 0.0137 737.4928
    DDRQN-AD 3.8699 199.6796 0.0029 681.6042
    DDRQN 10.7540 198.4822 0.0052 698.4800
    Yuexi PDWoLF-PHC($\lambda~)$ 30.8557 194.6218 0.0101 707.6413
    DWoLF-PHC($\lambda~)$ 71.3868 193.3833 0.0111 717.5449
    Q($\lambda~)$ 133.4363 192.6690 0.0138 725.5942
    DDRQN-AD 4.6580 199.0981 0.0027 648.7999
    DDRQN 15.4619 198.3565 0.0052 670.1390
    Zhusanjiao PDWoLF-PHC($\lambda~)$ 32.3917 194.2996 0.0116 672.3745
    DWoLF-PHC($\lambda~)$ 77.1797 192.6169 0.0112 674.3519
    Q($\lambda~)$ 139.7009 179.0764 0.0143 696.9169
    DDRQN-AD 4.8149 199.9593 0.0031 644.2032
    DDRQN 10.5419 198.5774 0.0056 659.5584
    Yuedong PDWoLF-PHC($\lambda~)$ 32.3602 195.2387 0.0099 680.9719
    DWoLF-PHC($\lambda~)$ 72.2484 193.1077 0.0115 689.6905
    Q($\lambda~)$ 147.6071 191.5688 0.0137 702.1414