SCIENCE CHINA Information Sciences, Volume 64 , Issue 6 : 162303(2021) https://doi.org/10.1007/s11432-020-3125-y

Intelligent resource allocation in mobile blockchain for privacy and security transactions: a deep reinforcement learning based approach

More info
  • ReceivedMay 1, 2020
  • AcceptedNov 19, 2020
  • PublishedApr 25, 2021



This work was supported in part by National Key RD Program of China (Grant No. 2018YFE0206800), National Natural Science Foundation of China (Grant Nos. 61701406, 61971084, 62001073), National Natural Science Foundation of Chongqing (Grant Nos. cstc2019jcyjcxttX0002, cstc2019jcyj-msxmX0208), and Chongqing Talent Program (Grant No. CQYC2020058659).


[1] Ning Z, Dong P, Wang X, et al. Partial computation offloading and adaptive task scheduling for 5G-enabledvehicular networks. IEEE Trans Mobile Comput, 2020. doi: 10.1109/TMC.2020.3025116. Google Scholar

[2] Xu L D, He W, Li S. Internet of Things in Industries: A Survey. IEEE Trans Ind Inf, 2014, 10: 2233-2243 CrossRef Google Scholar

[3] Ning Z, Dong P, Wang X. Mobile Edge Computing Enabled 5G Health Monitoring for Internet of Medical Things: A Decentralized Game Theoretic Approach. IEEE J Sel Areas Commun, 2021, 39: 463-478 CrossRef Google Scholar

[4] Liu D, Alahmadi A, Ni J. Anonymous Reputation System for IIoT-Enabled Retail Marketing Atop PoS Blockchain. IEEE Trans Ind Inf, 2019, 15: 3527-3537 CrossRef Google Scholar

[5] Wang X, Ning Z, Zhou M C. Privacy-Preserving Content Dissemination for Vehicular Social Networks: Challenges and Solutions. IEEE Commun Surv Tutorials, 2019, 21: 1314-1345 CrossRef Google Scholar

[6] Yu Y, Ning Z, Guo L. A secure routing scheme based on social network analysis in wireless mesh networks. Sci China Inf Sci, 2016, 59: 122310 CrossRef Google Scholar

[7] Yang Z, Yang K, Lei L. Blockchain-Based Decentralized Trust Management in Vehicular Networks. IEEE Internet Things J, 2019, 6: 1495-1505 CrossRef Google Scholar

[8] Ali M S, Vecchio M, Pincheira M. Applications of Blockchains in the Internet of Things: A Comprehensive Survey. IEEE Commun Surv Tutorials, 2019, 21: 1676-1717 CrossRef Google Scholar

[9] Ning Z, Zhang K, Wang X. Intelligent Edge Computing in Internet of Vehicles: A Joint Computation Offloading and Caching Solution. IEEE Trans Intell Transp Syst, 2020, : 1-14 CrossRef Google Scholar

[10] Wang X, Ning Z, Guo S. Imitation Learning Enabled Task Scheduling for Online Vehicular Edge Computing. IEEE Trans Mobile Comput, 2020, : 1-1 CrossRef Google Scholar

[11] Ning Z, Kwok R Y K, Zhang K. Joint Computing and Caching in 5G-Envisioned Internet of Vehicles: A Deep Reinforcement Learning-Based Traffic Control System. IEEE Trans Intell Transp Syst, 2020, : 1-12 CrossRef Google Scholar

[12] Zhu J, Song Y, Jiang D. A New Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of Things. IEEE Internet Things J, 2018, 5: 2375-2385 CrossRef Google Scholar

[13] Luong N C, Hoang D T, Gong S. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey. IEEE Commun Surv Tutorials, 2019, 21: 3133-3174 CrossRef Google Scholar

[14] Lei L, Xu H, Xiong X. Multiuser Resource Control With Deep Reinforcement Learning in IoT Edge Computing. IEEE Internet Things J, 2019, 6: 10119-10133 CrossRef Google Scholar

[15] Chen M, Hao Y. Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network. IEEE J Sel Areas Commun, 2018, 36: 587-597 CrossRef Google Scholar

[16] Nguyen D, Pathirana P, Ding M, et al. Privacy-preserved task offloading in mobile blockchain with deep reinforcement learning. 2019,. arXiv Google Scholar

[17] Dai Y, Xu D, Maharjan S. Blockchain and Deep Reinforcement Learning Empowered Intelligent 5G Beyond. IEEE Network, 2019, 33: 10-17 CrossRef Google Scholar

[18] Feng J, Richard Yu F, Pei Q. Cooperative Computation Offloading and Resource Allocation for Blockchain-Enabled Mobile-Edge Computing: A Deep Reinforcement Learning Approach. IEEE Internet Things J, 2020, 7: 6214-6228 CrossRef Google Scholar

[19] Qiu X, Liu L, Chen W. Online Deep Reinforcement Learning for Computation Offloading in Blockchain-Empowered Mobile Edge Computing. IEEE Trans Veh Technol, 2019, 68: 8050-8062 CrossRef Google Scholar

[20] Xiong Z, Feng S, Niyato D, et al. Edge computing resource management and pricing for mobile blockchain. 2017,. arXiv Google Scholar

[21] Xiong Z, Feng S, Wang W. Cloud/Fog Computing Resource Management and Pricing for Blockchain Networks. IEEE Internet Things J, 2019, 6: 4585-4600 CrossRef Google Scholar

[22] Kang J, Xiong Z, Niyato D. Toward Secure Blockchain-Enabled Internet of Vehicles: Optimizing Consensus Management Using Reputation and Contract Theory. IEEE Trans Veh Technol, 2019, 68: 2906-2920 CrossRef Google Scholar

[23] Qiu C, Hu Y, Chen Y. Deep Deterministic Policy Gradient (DDPG)-Based Energy Harvesting Wireless Communications. IEEE Internet Things J, 2019, 6: 8577-8588 CrossRef Google Scholar

[24] Lillicrap T, Hunt J, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015,. arXiv Google Scholar

[25] Mao C, Lin R, Xu C. Towards a Trust Prediction Framework for Cloud Services Based on PSO-Driven Neural Network. IEEE Access, 2017, 5: 2187-2199 CrossRef Google Scholar

[26] Asheralieva A, Niyato D. Learning-Based Mobile Edge Computing Resource Management to Support Public Blockchain Networks. IEEE Trans Mobile Comput, 2020, : 1-1 CrossRef Google Scholar

[27] Yu J, Kozhaya D, Decouchant J. RepuCoin: Your Reputation Is Your Power. IEEE Trans Comput, 2019, 68: 1225-1237 CrossRef Google Scholar

[28] Liu Y, Yu F R, Li X. Decentralized Resource Allocation for Video Transcoding and Delivery in Blockchain-Based System With Mobile Edge Computing. IEEE Trans Veh Technol, 2019, 68: 11169-11185 CrossRef Google Scholar

[29] Liu M, Yu F R, Teng Y. Computation Offloading and Content Caching in Wireless Blockchain Networks With Mobile Edge Computing. IEEE Trans Veh Technol, 2018, 67: 11008-11021 CrossRef Google Scholar

  • Figure 1

    (Color online) System model.

  • Figure 2

    (Color online) The convergence performance of different methods: (a) the loss of actor net, and (b) the loss of critic net.

  • Figure 3

    (Color online) Experiment results for different: (a) bandwidths, (b) computing powers, and (c) numbers of devices.

  • Figure 4

    (Color online) Experiment results under: (a) different values of $\alpha$, and (b) different values of $\zeta$.


    Algorithm 1 The pseudo-code of DRPO algorithm

    Require:$\{T_{n}\}$, $\{\mathbb{F}_m,~\mathbb{B}_m\}$;


    Initialize the parameters in DDPG:

    Parameters of actor online net and critic online net, i.e., $\theta^\mu$ and $\theta^Q$;

    Parameters of actor target net and critic target net, i.e., $\theta^{\mu'}\leftarrow~\theta^\mu$, $\theta^{Q'}\leftarrow\theta^Q$;

    Experience replay buffer $\mathcal{L}$ with size $l$;

    Number indicator $\wp$ of samples in $\mathcal{L}$;

    Decision epoch $k$ and constant parameter $\zeta$;

    Initialize the parameters in PSO:

    Maximum number of iteration $\eth$;

    Number of particles $Z$;

    while the maximum number of repetitions is not reached do


    for $k=1;k\leq~K;k++$

    Solve P1 to obtain the optimal bandwidth allocation strategy $\{b_{n,m}(k)\}$ at decision epoch $k$;

    Dedicated controller inputs system state $S^k$ to the actor net of DDPG to obtain action $A^k$;

    Adding noise to action $A^k$, i.e., $A^k~\leftarrow~A^k~+~n_0$;

    if ${\rm~PSO}_{\rm~flag}={\rm~True}$ then

    Replace $A_k$ with an improved action by leveraging the PSO algorithm, i.e., $A^k~\leftarrow~{\rm~PSO}(A^k)$;

    end if

    Execute action $A^k$ (i.e., $\{f_{n,m}(k)\}$) and get reward $R^k$ as well as the next state $S^{k+1}$;

    Store transition quadruple $(S^k,~A^k,~R^k,~S^{k+1})$ in experience replay buffer $\mathcal{L}$;


    if $\wp>l$ then

    Select $W$ mini-batch samples from $\mathcal{L}$ and update the parameters of critic online net as well as actor online net, i.e., $\theta^Q$ and $\theta^\mu$ based on (26) and (30);

    if the loss of critic net is lower than $\zeta$ then




    end if

    end if

    Regularly update the parameters of actor target net and critic target net according to rule (31);

    end for

    end while

    return $\mathcal{B}_{N,~M},~\mathcal{F}_{N,~M}$.

  • Table 1  

    Table 1Summation of main notations

    Notation Description
    $\mathcal{M}$ The set of MEC servers
    $\mathcal{N}_m$ The set of devices requesting from MEC server $m$
    $\mathbb{F}_m$ The total computing power of MEC server $m$
    $\mathbb{B}_m$ The total bandwidth of MEC server $m$
    $T_n$ The mining task of device $n$
    $D_n$ The original data size of mining task of device $n$
    $Y_n$ The computation intensity of mining task of device $n$
    $G_n$ The budget of device $n$ for its mining task
    $I_n$ The data size of mining result of device $n$
    $f_{n,m}$ The allocated computing power of device $n$
    $b_{n,m}$ The allocated bandwidth of device $n$
    $p_{n,m}$ The unite operating price of MEC server $m$ for device $n$

Contact and support