SCIENTIA SINICA Informationis, Volume 48 , Issue 7 : 932-946(2018) https://doi.org/10.1360/N112017-00302

Design and implementation of large-scale network propagation simulation method inspired by Pregel mechanism

More info
  • ReceivedMar 27, 2018
  • AcceptedApr 8, 2018
  • PublishedJul 18, 2018


Funded by








[1] Watts D J, Strogatz S H. Collective dynamics of `small-world' networks. Nature, 1998, 393: 440-442 CrossRef PubMed ADS Google Scholar

[2] Barabási A L, Albert R, Jeong H. Mean field theory for scale-free random networks. Physica A-Statistical Mech its Appl, 1999, 272: 173-187 CrossRef ADS Google Scholar

[3] Pastor-Satorras R, Vespignani A. Epidemic Spreading in Scale-Free Networks. Phys Rev Lett, 2001, 86: 3200-3203 CrossRef PubMed ADS Google Scholar

[4] Lloyd A L. EPIDEMIOLOGY: How Viruses Spread Among Computers and People. Science, 2001, 292: 1316-1317 CrossRef Google Scholar

[5] May R M, Lloyd A L. Infection dynamics on scale-free networks. Phys Rev E, 2001, 64: 066112 CrossRef PubMed ADS Google Scholar

[6] Granovetter M. Threshold Models of Collective Behavior. Am J Sociology, 1978, 83: 1420-1443 CrossRef Google Scholar

[7] Li Q, Braunstein L A, Wang H. Non-consensus Opinion Models on Complex Networks. J Stat Phys, 2013, 151: 92-112 CrossRef ADS arXiv Google Scholar

[8] Qu B, Li Q, Havlin S. Nonconsensus opinion model on directed networks. Phys Rev E, 2014, 90: 052811 CrossRef PubMed ADS arXiv Google Scholar

[9] Goldenberg J, Libai B. Using complex systems analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata. Acad Market Sci Rev, 2001, 9: 1--18. Google Scholar

[10] Goldenberg J, Libai B, Muller E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Lett, 2001, 12: 211-223 CrossRef Google Scholar

[11] Hethcote H W. The Mathematics of Infectious Diseases. SIAM Rev, 2000, 42: 599-653 CrossRef ADS Google Scholar

[12] Newman M E J. Spread of epidemic disease on networks. Phys Rev E, 2002, 66: 016128 CrossRef PubMed ADS Google Scholar

[13] Chopard B, Droz M. Cellular Automata Modeling of Physical Systems. Cambridge: Cambridge University Press, 1998. Google Scholar

[14] Zhang M X, Seck M, Verbraeck A. A DEVS-based M&S method for large-scale multi-agent systems. In: Proceedings of the 2013 Summer Computer Simulation Conference, Toronto, 2013. 3. Google Scholar

[15] Collier N, Ozik J, Macal C M. Large-scale agent-based modeling with repast HPC: a case study in parallelizing an agent-based model. In: Euro-Par 2015: Parallel Processing Workshops. Berlin: Springer, 2015. 454--465. Google Scholar

[16] Seck M, Verbraeck A. Devs in dsol: adding devs operational semantics to a generic event-scheduling simulation environment. In: Proceedings of the Summer Computer Simulation Conference, Istanbul, 2009. 261--266. Google Scholar

[17] Staudt C L, Sazonovs A, Meyerhenke H. Networkit: an interactive tool suite for high-performance network analysis,. arXiv Google Scholar

[18] Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the 3rd International AAAI Conference On Weblogs And Social Media, San Jose, 2009. Google Scholar

[19] Batagelj V, Mrvar A. Pajek---analysis and visualization of large networks. In: Graph Drawing Software. Berlin: Springer, 2004. 2265: 77--103. Google Scholar

[20] Leskovec J, Sosic R. Snap: a general-purpose network analysis and graph-mining library. ACM Trans Intell Syst Tech, 2016, 8: 1. Google Scholar

[21] Lugowski A, Alber D, Buluç A, et al. A flexible open-source toolbox for scalable complex graph analysis. In: Proceedings of the 2012 SIAM International Conference on Data Mining, Anaheim, 2012. Google Scholar

[22] Ediger D, Jiang K, Riedy E J. GraphCT: Multithreaded Algorithms for Massive Graph Analysis. IEEE Trans Parallel Distrib Syst, 2013, 24: 2220-2229 CrossRef Google Scholar

[23] Ediger D, Mccoll R, Riedy J, et al. Stinger: high performance data structure for streaming graphs. In: Proceedings of 2012 IEEE Conference on High Performance Extreme Computing, Waltham, 2013. 1--5. Google Scholar

[24] Shun J L, Blelloch G E. Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Shenzhen, 2013. 135--146. Google Scholar

[25] Rutherford A R, Ramadanović B, Ahrenberg L, et al. Control of an HIV epidemic among injection drug users: simulation modeling on complex networks. In: Proceedings of Winter Simulation Conference, Arlington, 2016. 23--37. Google Scholar

[26] Capasso V, Serio G. A generalization of the Kermack-McKendrick deterministic epidemic model. Math Biosci, 1978, 42: 43-61 CrossRef Google Scholar

[27] Liu L, Qu B, Chen B, et al. Modeling of information diffusion on social networks with applications to wechat,. arXiv Google Scholar

[28] Goudreau M W, Lang K, Rao S B. Portable and efficient parallel computing using the BSP model. IEEE Trans Comput, 1999, 48: 670-689 CrossRef Google Scholar

[29] Hill J M D, McColl B, Stefanescu D C. BSPlib: The BSP programming library. Parallel Computing, 1998, 24: 1947-1980 CrossRef Google Scholar

[30] Skillicorn D B, Hill J M D, Mccoll W F. Questions and answers about BSP. Sci Programm, 1997, 6: 249--274. Google Scholar

[31] Malewicz G, Austern M H, Bik A J C, et al. Pregel: a system for large-scale graph processing. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Indianapolis, 2010. 135--146. Google Scholar

[32] Gonzalez J E, Low Y C, Gu H J, et al. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation, Hollywood, 2012. 17--30. Google Scholar

[33] Gonzalez J E, Xin R S, Dave A, et al. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of Usenix Conference on Operating Systems Design and Implementation, Broomfield, 2014. 599--613. Google Scholar

[34] Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets. In: Proceedings of the 2nd Usenix Conference on Hot Topics in Cloud Computing, Berkeley, 2010. 10--10. Google Scholar

[35] Valiant L G. A bridging model for parallel computation. Commun ACM, 1990, 33: 103--111. Google Scholar

  • Figure 1

    (Color online) State transition of SIR model

  • Figure 2

    (Color online) State transition of SVFR model

  • Figure 3

    (Color online) Graph partition of GraphX

  • Figure 4

    (Color online) Data structure of GraphX

  • Figure 5

    (Color online) Find max value by Pregel, shadow node is inactive

  • Figure 6

    (Color online) State transition

  • Figure 7

    (Color online) Node attribute definition

  • Figure 8

    (Color online) Message definition

  • Figure 9

    (Color online) The impact ofdifferent nodes size on calculation time (minimumdegree 6, infection factor 0.4). (a) Network generating time; (b) propagate calculation time

  • Figure 10

    (Color online) The impact of different minimum degree on calculation time (node size is 100000, infection factor is 0.4). (a) Network generating time; (b) propagate calculation time

  • Figure 11

    (Color online) The impact of different infection factor on calculation time (node size is 100000, minimum degree is 6). (a) Network generating time; (b) propagate calculation time


    Algorithm 1 Pregel API运行流程

    for all $i~\in~{\rm~nodes}~$

    vprogFunc($i$, initialMsg);

    end for


    for all edge $\in$ edges


    end for

    for all $i~\in~{\rm~nodes}~$

    mergeMsgFunc(Msg $a$, Msg $b$);

    end for

    for all $i~\in$ nodes

    vprogFunc(value, Msg);

    end for

    until No more Msg produced;


    Algorithm 2 sendMsgFunc

    Require:dstId, srcId, dstAttr, srcAttr;


    if srcAttr(0) = 0 dstAttr(0)= 2 then

    return Iterator(dstId, Array(4, 0, 0));


    if srcAttr(0) = 2, dstAttr(0)= 0 then

    return Iterator(dstId, Array(2, srcId, srcAttr(2)+1));


    return Iterator.empty;

    end if

    end if


    Algorithm 3 vprogFunc

    Require:nodeId, Value, Msg, $\beta$, $\gamma$;


    if Msg(0) = 0 then

    if nodeId = Value(1) then

    return Array(2, 0, 0);


    return Array(0, 0, 0);

    end if


    choose a random $p\in(0,1)$;

    if $p<\beta$ then

    choose a random $q\in(0,1)$;

    if $q<\gamma$ then

    return Array(2, Msg(1), Msg(2));


    return Array(3, Msg(1), Msg(2));

    end if


    return Array(0, 0, 0);

    end if

    end if


    Algorithm 4 配置网络生成



    index = 0;

    for all $i\in(1,n)$

    for all $j~\in~(1,{\rm~degree})$



    end for

    end for

    while nodelist is not empty do

    choose two random $m,n~\in~(1$, size of nodelist$),~m~\neq~n$;

    edge=(nodelist$m$, nodelist$n$);



    delete nodelistlast$-$1;

    delete nodelistlast;

    push edge in EDGE;

    end while

    return EDGE;


    Algorithm 5 度序列生成



    sum = 0;

    for all $i~\in~({\rm~kmin},{\rm~kmax})$



    end for

    for all $j~\in~({\rm~kmin},{\rm~kmax})$


    end for

    for all $k~\in~(1,n)~$

    choose a new random $~p\in~(0,1)$;


    for all $l~\in~({\rm~kmin},{\rm~kmax})$

    if ${\rm~cumpro}[l]~<~p$ then


    end if

    end for

    end for

    return degree;