logo

SCIENTIA SINICA Informationis, Volume 49 , Issue 8 : 988-1004(2019) https://doi.org/10.1360/N112018-00125

An efficient incremental strongly connected components algorithm for evolving directed graphs

Xiaofei LIAO 1,2,3,4, Yicheng CHEN 1,2,3,4, Yu ZHANG 1,2,3,4,*, Hai JIN 1,2,3,4, Haikun LIU 1,2,3,4, Jin ZHAO 1,2,3,4
More info
  • ReceivedMay 16, 2018
  • AcceptedApr 18, 2019
  • PublishedAug 7, 2019

Abstract


Funded by

国家重点研发计划(2018YFB1003500)

国家自然科学基金(61832006,61825202,61702202)


References

[1] Leskovec J, Kleinberg J, Faloutsos C. Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2005. 177--187. Google Scholar

[2] Zhang Y, Liao X, Shi X. Efficient Disk-Based Directed Graph Processing: A Strongly Connected Component Approach. IEEE Trans Parallel Distrib Syst, 2018, 29: 830-842 CrossRef Google Scholar

[3] Tarjan R. Depth-First Search and Linear Graph Algorithms. SIAM J Comput, 1972, 1: 146-160 CrossRef Google Scholar

[4] Aho A V, Hopcroft J E, Ullman J. Data Structures and Algorithms. Boston: Addison-Wesley, 1983. 1--983. Google Scholar

[5] Orzan S M. On Distributed Verification and Verified Distribution. Vrije Universiteit, 2004. Google Scholar

[6] McLendon III W, Hendrickson B, Plimpton S J. Finding strongly connected components in distributed graphs. J Parallel Distributed Computing, 2005, 65: 901-910 CrossRef Google Scholar

[7] Hellings J, Fletcher G H L, Haverkort H. Efficient external-memory bisimulation on DAGs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, 2012. 553--564. Google Scholar

[8] Sarwat M, Sun Y. Answering location-aware graph reachability queries on geosocial data. In: Proceedings of the 33rd International Conference on Data Engineering, San Diego, 2017. 207--210. Google Scholar

[9] Yildirim H, Chaoji V, Zaki M J. GRAIL. Proc VLDB Endow, 2010, 3: 276-284 CrossRef Google Scholar

[10] Fan W, Li J, Ma S. Graph homomorphism revisited for graph matching. Proc VLDB Endow, 2010, 3: 1161-1172 CrossRef Google Scholar

[11] Fleischer L K, Hendrickson B, Pinar A. On identifying strongly connected components in parallel. In: Prceedings of the 14th International Parallel and Distributed Processing Symposium, Cancun, 2000. 505--511. Google Scholar

[12] Zhang Z, Yu J X, Qin L, et al. I/O efficient: computing SCCs in massive graphs. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, 2013. 181--192. Google Scholar

[13] Barnat J, Chaloupka J, van de Pol J. Improved Distributed Algorithms for SCC Decomposition. Electron Notes Theor Comput Sci, 2008, 198: 63-77 CrossRef Google Scholar

[14] Hong S, Rodia N C, Olukotun K. On fast parallel detection of strongly connected components in small-world graphs. In: Proceedings of the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, 2013. 1--11. Google Scholar

[15] Slota G M, Rajamanickam S, Madduri K. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In: Proceedings of the 28th IEEE Parallel and Distributed Processing Symposium, Phoenix, 2014. 550--559. Google Scholar

[16] Bloemen V, Laarman A, van de Pol J. Multi-core on-the-fly SCC decomposition. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Barcelona, 2016. 1--12. Google Scholar

[17] Ding L L, Li Z D, Ji W T, et al. Reachability query of large scale dynamic graph based on improved Huffman coding. Acta Electron Sin, 2017, 45: 359--367. Google Scholar

[18] Ntoulas A, Cho J, Olston C. What's new on the web?: the evolution of the web from a search engine perspective. In: Proceedings of the 13th International Conference on World Wide Web, New York, 2004. 1--12. Google Scholar

[19] Broder A, Kumar R, Maghoul F. Graph structure in the Web. Comput Networks, 2000, 33: 309-320 CrossRef Google Scholar

[20] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters. Commun ACM, 2008, 51: 107 CrossRef Google Scholar

[21] Wang J, Zhang L, Wang P Y. Memory system optimization for graph processing: a survey. Sci Sin-Inf, 2019, 49: 295-313 CrossRef Google Scholar

[22] Zhang Y, Liao X, Jin H. FBSGraph: Accelerating Asynchronous Graph Processing via Forward and Backward Sweeping. IEEE Trans Knowl Data Eng, 2018, 30: 895-907 CrossRef Google Scholar

[23] Wang L, Yang X J, Dai H D. Scratchpad memory allocation for arrays in permutation graphs. Sci China Inf Sci, 2013, 56: 1-13 CrossRef Google Scholar

[24] Jin H, Yao P C, Liao X F. Towards dataflow based graph processing. Sci China Inf Sci, 2017, 60: 126102 CrossRef Google Scholar

[25] Dai G, Huang T, Chi Y, et al. Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2017. 217--226. Google Scholar

[26] Ju X, Williams D, Jamjoom H, et al. Version traveler: fast and memory-efficient version switching in graph processing systems. In: Proceedings of the 2016 USENIX Annual Technical Conference, Denver, 2016. 523--536. Google Scholar

[27] Zhang L X, Wang W P, Gao J L, et al. Pattern graph change oriented incremental graph pattern matching. J Softw, 2015, 26: 2964--2980. Google Scholar

[28] Semertzidis K, Pitoura E, Lillis K. TimeReach: historical reachability queries on evolving graphs. In: Proceedings of the 18th International Conference on Extending Database Technology, Brussels, 2015. 121--132. Google Scholar

[29] Jure L, Andrej K. SNAP Datasets: Stanford Large Network Dataset Collection. 2014. http://snap.stanford.edu/data. Google Scholar

  • Figure 1

    The first type of deleted edges: no change in SCC structure

  • Figure 2

    The second type of deleted edges: recomputation in SCC with structure changes

  • Figure 3

    The second type of deleted edges: no recomputation in SCC with structure changes

  • Figure 4

    The CSR format of an evolving graph

  • Figure 5

    Synchronous iteration

  • Figure 6

    Lock-free solution for data conflicts

  • Figure 7

    Performance curves of Inc-SCC when the ratio of changed edges of the entire graph is different on four different datasets. (a) Email-EuAll; (b) Wiki-Talk; (c) Web-NotreDame; (d) Web-Stanford

  • Figure 8

    Performance curves of Inc-SCC when the ratio of nodes in the largest SCC of the entire graph is different

  • Figure 9

    Performance curves of Inc-SCC with graphs continuing changing on two different datasets. (a) Web-NotreDame; (b) Wiki-Talk

  • Figure 10

    The scalability of Inc-SCC on six different datasets. (a) Email-EuAll;(b) Soc-Epinions1; (c) Web-Berstan; (d) Web-NotreDame; (e) Web-Stanford; (f) Wiki-Talk

  •   

    Algorithm 1 Incremental SCC algorithm

    Require:$G_{t_{i}}$, SCCMAP($G_{t_{i-1}}$), increments, $G_{t_{i-1}}$; Output: SCCMAP($G_{t_{i}}$);

    SCCMAP($G_{t_{i}})\Leftarrow$ $\emptyset$;

    $G_{\rm~new}~\Leftarrow$ Preprocess(SCCMAP($G_{t_{i-1}}$), increments, $G_{t_{i-1}}$);

    SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ ($G_{\rm~new}$);

    SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ LocalColoring($G_{\rm~new}$);

  • Table 1   Graph datasets
    Dataset Nodes Edges SCCs Nodes in the largest SCC
    Soc-epinions1 75888 508837 42185 32223 (42.5%)
    Email-euall 265214 420025 231000 34203 (12.9%)
    Web-stanford 281903 2312497 29919 150527 (53.4%)
    Web-notreDame 325729 1497134 203609 53968 (16.6%)
    Web-Berstan 685230 7600595 109407 334856 (48.9%)
    Wiki-Talk 2394385 5021410 2281879 111881 (4%)
  •   

    Algorithm 2 Preprocess

    Require:SCCMAP($G_{t_{i-1}}$); increments; $G_{t_{i-1}}$;

    Output:$G_{\rm~new}$;

    DAG $\Leftarrow$ Contract(SCCMAP($G_{t_{i-1}}$), $G_{t_{i-1}}$);

    for each edge $e~\in$ increments which is added in parallel

    if ${\rm~SCCMAP}_{e.{\rm~src}}~\neq~{\rm~SCCMAP}_{e.{\rm~dst}}$ then

    DAG $\Leftarrow$ DAG $\cup~e$;

    end if

    end for

    SCCMAP($G_{t_{i-1}}$) $\Leftarrow$ LocalColoring(DAG);

    $G_{\rm~new}~\Leftarrow~\emptyset$;

    for each edge $e~\in$ increments which is deleted in parallel

    if ${\rm~SCCMAP}_{e.{\rm~src}}={\rm~SCCMAP}_{e.{\rm~dst}}$ then

    OldSCC $\Leftarrow~{\rm~SCCMAP}_{e.{\rm~src}}$;

    $G_{\rm~new}~\Leftarrow~G_{\rm~new}~\cup~{\rm~OldSCC}$;

    end if

    end for

  • Table 2   Time cost of each part in Inc-SCC when the ratio of changed edges of the entire graph snapshot is 0.5%
    Dataset Preprocess (ms) Process (ms) LocalFBS (ms) LocalColoring (ms)
    Soc-epinions1 1.135 5.708 5.448 0.260
    Email-euall 2.628 4.502 4.261 0.241
    Web-stanford 2.628 4.502 4.261 0.241
    Web-notredame 3.117 11.438 10.380 1.058
    Web-Berstan 6.946 55.542 48.880 6.662
    Wiki-Talk 19.199 68.166 64.032 3.134
  • Table 3   Speedup of Inc-SCC over baseline
    Dataset SCO (ms) Inc-SCC (ms) Speedup
    Soc-epinions1 46.746 5.708 8.190
    Email-euall 28.100 4.502 6.242
    Web-stanford 120.951 35.729 3.385
    Web-notredame 51.147 11.438 4.472
    Web-Berstan 160.622 55.542 2.897
    Wiki-Talk 825.124 68.166 12.104
  •   

    Algorithm 3 LocalFBS

    Require:$G_{\rm~new}$;

    Output:SCCMAP($G_{t_{i}}$); $G_{\rm~new}$;

    SCCMAP($G_{t_{i}}$) $\Leftarrow$ SCCMAP($G_{t_{i-1}}$)$\setminus~G_{\rm~new}$;

    for all OldSCC in $G_{\rm~new}$ in parallel

    Select root $\in$ OldSCC;

    $D~\Leftarrow~{\rm~FS(OldSCC,~root)}~$;

    $P~\Leftarrow~{\rm~BS(OldSCC,~root)}~$;

    NewSCC(root) $\Leftarrow$ $D~\cap~P$;

    SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ NewSCC(root);

    $G_{\rm~new}~\Leftarrow~G_{\rm~new}~\setminus$NewSCC(root);

    end for

  •   

    Algorithm 4 LocalColoring

    Require:$G_{\rm~new}$;

    Output:${\rm~SCCMAP}(G_{t_{i}})$;

    while $G_{\rm~new}~\neq~\emptyset$ do

    for all OldSCC $\in~G_{\rm~new}$ in parallel

    Init colors;

    while colors have changed do

    for edge e $\in$ OldSCC in parallel

    if colorse.dst $>$ colorse.src then

    colorse.dst $\Leftarrow$ colorse.src;

    end if

    end for

    end while

    for all vertex root = colorsroot in parallel

    NewSCC(root)$\Leftarrow$ BS(OldSCC, root);

    SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ NewSCC(root);

    $G_{\rm~new}~\Leftarrow~G_{\rm~new}~\setminus$NewSCC(root);

    end for

    end for

    end while

qqqq

Contact and support