国家重点研发计划(2018YFB1003500)
国家自然科学基金(61832006,61825202,61702202)
[1] Leskovec J, Kleinberg J, Faloutsos C. Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2005. 177--187. Google Scholar
[2] Zhang Y, Liao X, Shi X. Efficient Disk-Based Directed Graph Processing: A Strongly Connected Component Approach. IEEE Trans Parallel Distrib Syst, 2018, 29: 830-842 CrossRef Google Scholar
[3] Tarjan R. Depth-First Search and Linear Graph Algorithms. SIAM J Comput, 1972, 1: 146-160 CrossRef Google Scholar
[4] Aho A V, Hopcroft J E, Ullman J. Data Structures and Algorithms. Boston: Addison-Wesley, 1983. 1--983. Google Scholar
[5] Orzan S M. On Distributed Verification and Verified Distribution. Vrije Universiteit, 2004. Google Scholar
[6] McLendon III W, Hendrickson B, Plimpton S J. Finding strongly connected components in distributed graphs. J Parallel Distributed Computing, 2005, 65: 901-910 CrossRef Google Scholar
[7] Hellings J, Fletcher G H L, Haverkort H. Efficient external-memory bisimulation on DAGs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, 2012. 553--564. Google Scholar
[8] Sarwat M, Sun Y. Answering location-aware graph reachability queries on geosocial data. In: Proceedings of the 33rd International Conference on Data Engineering, San Diego, 2017. 207--210. Google Scholar
[9] Yildirim H, Chaoji V, Zaki M J. GRAIL. Proc VLDB Endow, 2010, 3: 276-284 CrossRef Google Scholar
[10] Fan W, Li J, Ma S. Graph homomorphism revisited for graph matching. Proc VLDB Endow, 2010, 3: 1161-1172 CrossRef Google Scholar
[11] Fleischer L K, Hendrickson B, Pinar A. On identifying strongly connected components in parallel. In: Prceedings of the 14th International Parallel and Distributed Processing Symposium, Cancun, 2000. 505--511. Google Scholar
[12] Zhang Z, Yu J X, Qin L, et al. I/O efficient: computing SCCs in massive graphs. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, 2013. 181--192. Google Scholar
[13] Barnat J, Chaloupka J, van de Pol J. Improved Distributed Algorithms for SCC Decomposition. Electron Notes Theor Comput Sci, 2008, 198: 63-77 CrossRef Google Scholar
[14] Hong S, Rodia N C, Olukotun K. On fast parallel detection of strongly connected components in small-world graphs. In: Proceedings of the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, 2013. 1--11. Google Scholar
[15] Slota G M, Rajamanickam S, Madduri K. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In: Proceedings of the 28th IEEE Parallel and Distributed Processing Symposium, Phoenix, 2014. 550--559. Google Scholar
[16] Bloemen V, Laarman A, van de Pol J. Multi-core on-the-fly SCC decomposition. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Barcelona, 2016. 1--12. Google Scholar
[17] Ding L L, Li Z D, Ji W T, et al. Reachability query of large scale dynamic graph based on improved Huffman coding. Acta Electron Sin, 2017, 45: 359--367. Google Scholar
[18] Ntoulas A, Cho J, Olston C. What's new on the web?: the evolution of the web from a search engine perspective. In: Proceedings of the 13th International Conference on World Wide Web, New York, 2004. 1--12. Google Scholar
[19] Broder A, Kumar R, Maghoul F. Graph structure in the Web. Comput Networks, 2000, 33: 309-320 CrossRef Google Scholar
[20] Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters. Commun ACM, 2008, 51: 107 CrossRef Google Scholar
[21] Wang J, Zhang L, Wang P Y. Memory system optimization for graph processing: a survey. Sci Sin-Inf, 2019, 49: 295-313 CrossRef Google Scholar
[22] Zhang Y, Liao X, Jin H. FBSGraph: Accelerating Asynchronous Graph Processing via Forward and Backward Sweeping. IEEE Trans Knowl Data Eng, 2018, 30: 895-907 CrossRef Google Scholar
[23] Wang L, Yang X J, Dai H D. Scratchpad memory allocation for arrays in permutation graphs. Sci China Inf Sci, 2013, 56: 1-13 CrossRef Google Scholar
[24] Jin H, Yao P C, Liao X F. Towards dataflow based graph processing. Sci China Inf Sci, 2017, 60: 126102 CrossRef Google Scholar
[25] Dai G, Huang T, Chi Y, et al. Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2017. 217--226. Google Scholar
[26] Ju X, Williams D, Jamjoom H, et al. Version traveler: fast and memory-efficient version switching in graph processing systems. In: Proceedings of the 2016 USENIX Annual Technical Conference, Denver, 2016. 523--536. Google Scholar
[27] Zhang L X, Wang W P, Gao J L, et al. Pattern graph change oriented incremental graph pattern matching. J Softw, 2015, 26: 2964--2980. Google Scholar
[28] Semertzidis K, Pitoura E, Lillis K. TimeReach: historical reachability queries on evolving graphs. In: Proceedings of the 18th International Conference on Extending Database Technology, Brussels, 2015. 121--132. Google Scholar
[29] Jure L, Andrej K. SNAP Datasets: Stanford Large Network Dataset Collection. 2014. http://snap.stanford.edu/data. Google Scholar
Figure 1
The first type of deleted edges: no change in SCC structure
Figure 2
The second type of deleted edges: recomputation in SCC with structure changes
Figure 3
The second type of deleted edges: no recomputation in SCC with structure changes
Figure 4
The CSR format of an evolving graph
Figure 5
Synchronous iteration
Figure 6
Lock-free solution for data conflicts
Figure 7
Performance curves of Inc-SCC when the ratio of changed edges of the entire graph is different on four different datasets. (a) Email-EuAll; (b) Wiki-Talk; (c) Web-NotreDame; (d) Web-Stanford
Figure 8
Performance curves of Inc-SCC when the ratio of nodes in the largest SCC of the entire graph is different
Figure 9
Performance curves of Inc-SCC with graphs continuing changing on two different datasets. (a) Web-NotreDame; (b) Wiki-Talk
Figure 10
The scalability of Inc-SCC on six different datasets. (a) Email-EuAll;(b) Soc-Epinions1; (c) Web-Berstan; (d) Web-NotreDame; (e) Web-Stanford; (f) Wiki-Talk
SCCMAP($G_{t_{i}})\Leftarrow$ $\emptyset$; |
$G_{\rm~new}~\Leftarrow$ Preprocess(SCCMAP($G_{t_{i-1}}$), increments, $G_{t_{i-1}}$); |
SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ ($G_{\rm~new}$); |
SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ LocalColoring($G_{\rm~new}$); |
Dataset | Nodes | Edges | SCCs | Nodes in the largest SCC |
Soc-epinions1 | 75888 | 508837 | 42185 | 32223 (42.5%) |
Email-euall | 265214 | 420025 | 231000 | 34203 (12.9%) |
Web-stanford | 281903 | 2312497 | 29919 | 150527 (53.4%) |
Web-notreDame | 325729 | 1497134 | 203609 | 53968 (16.6%) |
Web-Berstan | 685230 | 7600595 | 109407 | 334856 (48.9%) |
Wiki-Talk | 2394385 | 5021410 | 2281879 | 111881 (4%) |
DAG $\Leftarrow$ Contract(SCCMAP($G_{t_{i-1}}$), $G_{t_{i-1}}$); |
|
DAG $\Leftarrow$ DAG $\cup~e$; |
|
SCCMAP($G_{t_{i-1}}$) $\Leftarrow$ LocalColoring(DAG); |
$G_{\rm~new}~\Leftarrow~\emptyset$; |
|
OldSCC $\Leftarrow~{\rm~SCCMAP}_{e.{\rm~src}}$; |
$G_{\rm~new}~\Leftarrow~G_{\rm~new}~\cup~{\rm~OldSCC}$; |
|
Dataset | Preprocess (ms) | Process (ms) | LocalFBS (ms) | LocalColoring (ms) |
Soc-epinions1 | 1.135 | 5.708 | 5.448 | 0.260 |
Email-euall | 2.628 | 4.502 | 4.261 | 0.241 |
Web-stanford | 2.628 | 4.502 | 4.261 | 0.241 |
Web-notredame | 3.117 | 11.438 | 10.380 | 1.058 |
Web-Berstan | 6.946 | 55.542 | 48.880 | 6.662 |
Wiki-Talk | 19.199 | 68.166 | 64.032 | 3.134 |
Dataset | SCO (ms) | Inc-SCC (ms) | Speedup |
Soc-epinions1 | 46.746 | 5.708 | 8.190 |
Email-euall | 28.100 | 4.502 | 6.242 |
Web-stanford | 120.951 | 35.729 | 3.385 |
Web-notredame | 51.147 | 11.438 | 4.472 |
Web-Berstan | 160.622 | 55.542 | 2.897 |
Wiki-Talk | 825.124 | 68.166 | 12.104 |
SCCMAP($G_{t_{i}}$) $\Leftarrow$ SCCMAP($G_{t_{i-1}}$)$\setminus~G_{\rm~new}$; |
Select root $\in$ OldSCC; |
$D~\Leftarrow~{\rm~FS(OldSCC,~root)}~$; |
$P~\Leftarrow~{\rm~BS(OldSCC,~root)}~$; |
NewSCC(root) $\Leftarrow$ $D~\cap~P$; |
SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ NewSCC(root); |
$G_{\rm~new}~\Leftarrow~G_{\rm~new}~\setminus$NewSCC(root); |
|
Init colors; |
|
|
|
colorse.dst $\Leftarrow$ colorse.src; |
|
|
|
|
NewSCC(root)$\Leftarrow$ BS(OldSCC, root); |
SCCMAP($G_{t_{i}}$) $\Leftarrow~$ SCCMAP($G_{t_{i}}$) $\cup$ NewSCC(root); |
$G_{\rm~new}~\Leftarrow~G_{\rm~new}~\setminus$NewSCC(root); |
|
|