SCIENTIA SINICA Informationis, Volume 51 , Issue 2 : 247(2021) https://doi.org/10.1360/SSI-2020-0316

## Real-time scheduling for carrier-borne aircraft support operations: a reinforcement learning approach

• AcceptedNov 18, 2020
• PublishedJan 26, 2021
Share
Rating

### References

[1] Zhou X G, Feng B S, Chi Z Y, et al. Analysis of carrier-borne aircraft movement rate based on closed line network. Ordn Ind Autom, 2014, 33: 79--83. Google Scholar

[2] Ryan J C, Banerjee A G, Cummings M L. Comparing the Performance of Expert User Heuristics and an Integer Linear Program in Aircraft Carrier Deck Operations. IEEE Trans Cybern, 2014, 44: 761-773 CrossRef Google Scholar

[3] Han W, Si W C, Ding D C, et al. Multi-routes dynamic planning on deck of carrier plane based on clustering PSO. J Beijing Univ Aeronaut Astronautics, 2013, 39: 610--614. Google Scholar

[4] Bian D P, Luan T T, Song Y. A layout method of carrier-based aircraft based on simulated annealing. Appl Sci Technol, 2015, 42: 20--24. Google Scholar

[5] Li J, Sun Z, Li M L, et al. Research on scheduling decision of carrier aircraft support operation. Ship Electron Eng, 2018, 38: 165--168. Google Scholar

[6] Abbeel P. Apprenticeship learning and reinforcement learning with application to robotic control. Palo Alto: Stanford University, 2008. Google Scholar

[7] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning. 2013,. arXiv Google Scholar

[8] Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529-533 CrossRef ADS Google Scholar

[9] Lin H, Zhan M F, Zhou F. Optimal scheduling algorithm and simulation of carrier-based aircraft recovery task. J Nav Univ Eng, 2008, 20: 50--54. Google Scholar

[10] Lv X F, Guo X W, Wang Y F. Ammunition scheduling sequence of carrier-based aircraft based on genetic algorithm. Ordn Ind Autom, 2011,30: 10--12. Google Scholar

[11] Feng Q, Zeng S K, Kang R. Simulation and optimization method for dynamic dispatch of carrier-based aircraft under uncertain conditions. J Syst Simul, 2011, 23: 1497--1501. Google Scholar

[12] Han W, Su X C, Chen J F. Multi-aircraft integrated maintenance scheduling method for carrier-based aircraft. J Syst Eng Electron, 2015, 37: 809--816. Google Scholar

[13] Fan J L, Zhu X D, Gao W, et al. Shipboard aircraft dispatching again based on parallel genetic algorithm. J Ordnance Eq Eng, 2019, 40: 139--143. Google Scholar

[14] Michael J W. An Introduction to Multiagent Systems. 2nd ed. Hoboken: John Wiley & Sons, 2009. Google Scholar

[15] Bakker B, Whiteson S, Kester L, et al. Traffic light control by multiagent reinforcement learning systems. In: Interactive Collaborative Information Systems. Berlin: Springer, 2010. 475--510. Google Scholar

[16] Xu Z, Li Z X, Guan Q W, et al. Large-scale order dispatch in on-demand ride-hailing platforms: a learning and planning approach. In: Proceedings of the 24th ACM SIGKDD International Conference, 2018. 905--913. Google Scholar

[17] Wang Y S, Tong Y X, Long C, et al. Adaptive dynamic bipartite graph matching: a reinforcement learning approach. In: Proceedings of the 35th International Conference on Data Engineering (ICDE), 2019. 1478--1489. Google Scholar

[18] Lin K X, Zhao R Y, Xu Z, et al. Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD International Conference, 2018. 1774--1783. Google Scholar

[19] Li Y X, Zheng Y, Yang Q. Efficient and effective express via contextual cooperative reinforcement learning. In: Proceedings of the 25th ACM SIGKDD International Conference, 2019. 510--519. Google Scholar

[20] Shan C, Mamoulis N, Cheng R, et al. An end-to-end deep RL framework for task arrangement in crowdsourcing platforms. In: Proceedings of the 36th International Conference on Data Engineering, 2020. 49--60. Google Scholar

[21] Zhang J, Liu Y, Zhou K, et al. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In: Proceedings of International Conference on Management of Data, 2019. 415--432. Google Scholar

[22] Li Y X, Zheng Y, Yang Q. Dynamic bike reposition: a spatio-transit reinforcement learning approach. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 1724--1733. Google Scholar

[23] Wei H, Zheng G J, Yao H X. IntelliLight: a reinforcement learning approach for intelligent traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 2496--2505. Google Scholar

[24] Chen S Y, Yu Y, Da Q, et al. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 1187--1196. Google Scholar

[25] Hu Y J, Da Q, Zeng A X, et al. Reinforcement learning to rank in e-commerce search engine: formalization, analysis, and application. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 368--377. Google Scholar

• Figure 1

(Color online) Real-time scheduling framework for support tasks based on reinforcement learning

• Figure 2

(Color online) Common process of carrier-borne aircraft support operation

•

Algorithm 1 Task scheduling strategy learning algorithm

Require:Current state $s_i$, action $a_i$, reward $R$, next state $s_{i+1}$.

Output:Next action $a_{i+1}$.

Initialize the environment and target network;

while training is not completed do

#Step 1: collect experience;

Randomly choose the next action $a_{i+1}$ or $a_{i+1}={\rm~max}_a~Q^*(\emptyset~(s_i,a_i,R,s_{i+1}))$;

Calculate the value of the target network and evaluation network, and store the transformation in model $M_Q$;

#Step 2: update parameters;

Sample a batch of experience $\{s_i,a_i,R,s_{i+1}\}$;

Update the target network and evaluation network;

end while

• Table 1   Details of carrier-borne aircraft support tasks
 Support task Support time (s) Need resources W1 1 No W2 3 Yes W3 3 Yes W4 3 Yes W5 3 No W6 4 Yes W7 4 No W8 4 Yes W9 2 No W10 2 No W11 5 Yes W12 6 No W13 6 Yes W14 4 Yes W15 1 No
• Table 2   Details of aircraft carrier support stations (example 1)
 Support station Number of resources Resources P1 4 W2, W3, W4, W11 P2 5 W2, W3, W6, W13, W14 P3 5 W4, W6, W8, W13, W14 P4 8 W2, W3, W4, W6, W8, W11, W13, W14 P5 6 W2, W3, W8, W11, W13, W14 P6 6 W2, W4, W6, W8, W13, W14
• Table 3   Execution sequences of support tasks
 Sequence Resources S1 W1, W2, W3, W4, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15 S2 W1, W2, W4, W3, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15 S3 W1, W3, W2, W4, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15 S4 W1, W2, W3, W4, W5, W6, W8, W7, W9, W10, W11, W12, W13, W14, W15 S5 W1, W2, W3, W4, W5, W6, W7, W8, W9, W10, W11, W12, W14, W13, W15 S6 W1, W2, W3, W4, W5, W6, W8, W7, W9, W10, W11, W12, W14, W13, W15 S7 W1, W4, W3, W2, W5, W6, W8, W7, W9, W10, W11, W12, W14, W13, W15 S8 W1, W3, W4, W2, W5, W6, W8, W7, W9, W10, W11, W12, W14, W13, W15 S9 W1, W3, W4, W2, W5, W6, W8, W7, W9, W10, W11, W12, W13, W14, W15 S10 W1, W3, W4, W2, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15
• Table 4   Details of aircraft carrier support stations (example 2)
 Support station Number of resources Resources P1 3 W2, W3, W11 P2 5 W2, W3, W6, W13, W14 P3 3 W6, W13, W14 P4 8 W2, W3, W4, W6, W8, W11, W13, W14 P5 8 W2, W3, W4, W6, W8, W11, W13, W14 P6 6 W2, W3, W6, W8, W13, W14 P7 8 W2, W3, W4, W6, W8, W11, W13, W14 P8 8 W2, W3, W4, W6, W8, W11, W13, W14 P9 6 W2, W3, W4, W6, W8, W11 P10 8 W2, W3, W4, W6, W8, W11, W13, W14 P11 7 W2, W3, W6, W8, W11, W13, W14 P12 5 W2, W3, W4, W6, W8 P13 2 W3, W11 P14 4 W8, W11, W13, W14 P15 8 W2, W3, W4, W6, W8, W11, W13, W14 P16 6 W2, W3, W4, W11, W13, W14
• Table 5   Comparison of average task response time (example 1)
 Sequence Ours Q-learning SA LP GA S1 0.0055 0.0069 0.509 0.636 0.464 S2 0.0048 0.0067 0.477 0.508 0.502 S3 0.0017 0.0051 0.490 0.607 0.525 S4 0.0083 0.0135 0.341 0.657 0.478 S5 0.0014 0.0128 0.508 0.518 0.499 S6 0.0055 0.0086 0.634 0.572 0.463 S7 0.0013 0.0146 0.438 0.696 0.454 S8 0.0014 0.0083 0.507 0.710 0.576 S9 0.0084 0.0015 0.596 0.602 0.507 S10 0.0061 0.0092 0.426 0.648 0.474
• Table 6   Comparison of average task response time (example 2)
 Sequence Ours Q-learning SA LP GA S1 0.0275 0.3319 23.792 19.02 21.22 S2 0.0156 0.3219 23.586 20.28 18.67 S3 0.0152 0.3253 24.772 22.66 20.33 S4 0.0352 0.2292 18.764 19.60 27.09 S5 0.0121 0.3209 21.755 19.75 25.65 S6 0.0234 0.3281 25.815 18.96 20.20 S7 0.0128 0.2287 20.083 19.19 22.33 S8 0.0195 0.3224 20.313 24.77 21.17 S9 0.0115 0.2261 21.696 18.70 19.75 S10 0.0129 0.3283 20.797 22.46 22.38

Citations

Altmetric