logo

SCIENTIA SINICA Informationis, Volume 49 , Issue 3 : 295-313(2019) https://doi.org/10.1360/N112018-00281

Memory system optimization for graph processing: a survey

More info
  • ReceivedOct 18, 2018
  • AcceptedDec 12, 2018
  • PublishedMar 18, 2019

Abstract


Funded by

国家重点研发计划(2018YFB1003500)


References

[1] Malewicz G, Austern M H, Bik A J, et al. Pregel: a system for largescale graph processing. In: Proceedings of the 2010 International Conference on Management of Data, Indianapolis, 2010. 135--146. Google Scholar

[2] Ham T J, Wu L, Sundaram N, et al. Graphicionado: a high-performance and energyecient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, 2016. 1--13. Google Scholar

[3] Ozdal M M, Yesil S, Kim T, et al. Energy ecient architecture for graph analyticsaccelerators. In: Proceedings of the 23rd ACM/IEEE Annual International Symposium on Computer Architecture, Seoul, 2016. 166--177. Google Scholar

[4] Dai G H, Chi Y Z, Wang Y, et al. FPGP: graph processing framework on FPGA a case study of breadth-first search. In: Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2016. 105--110. Google Scholar

[5] Nurvitadhi E, Weisz G, Wang Y, et al. Graphgen: an FPGA framework for vertex-centric graph computation. In: Proceedings of the 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, Boston, 2014. 25--28. Google Scholar

[6] Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles, Farmington, 2013. 472--488. Google Scholar

[7] Dai G, Huang T, Chi Y, et al. Fore-graph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2017. 217--226. Google Scholar

[8] Zhou S J, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, Campinas, 2017. 137--144. Google Scholar

[9] Song L, Zhuo Y, Qian X, et al. GraphR: acceleratinggraph processing using ReRAM. In: Proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture, Vienna, 2018. 531--543. Google Scholar

[10] Zhou S, Chelmis C, Prasanna V K. High-throughput andenergy-ecient graph processing on FPGA. In: Proceedings of the 24th International Symposium Field-Programmable Custom ComputingMachines, Washington, 2016. 103--110. Google Scholar

[11] Lakhotia K, Kannan R, Prasanna V. Accelerating PageRank using partition-centric processing. In: Proceedings of the 28th USENIX Security Symposium on Annual Technical Conference, Santa Clara, 2018. 427--440. Google Scholar

[12] Gonzalez J E, Low Y, Gu H, et al. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th Usenix Symposium on Operating Systems Design and Implementation, Hollywood, 2012. 17--30. Google Scholar

[13] Chen R, Shi J, Chen Y, et al. Powerlyra: differentiated graph computation and partitioning on skewed graphs. In: Proceedings of the 10th European Conference on Computer Systems, Bordeaux, 2015. 1--15. Google Scholar

[14] Maass S, Min C, Kashyap S, et al. Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the 12th European Conference on Computer Systems, Belgrade, 2017. 527--543. Google Scholar

[15] Zhang J, Khoram S, Li J. Boosting the performance of FPGA-based graph processor using hybrid memory cube: a case for breadth rst search. In: Proceedings ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2017. 207--216. Google Scholar

[16] Zhu X W, Chen W G, Zheng W M, et al. Gemini: a computation-centric distributed graph processing system. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, 2016. 301--316. Google Scholar

[17] Oguntebi T, Olukotun K. GraphOps: a dataflow library for graph analytics acceleration. In: Proceedings of the 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2016. 111--117. Google Scholar

[18] Attia O G, Johnson T, Townsend K, et al. CyGraph: a reconfigurable architecture for parallel breadth-first search. In: Proceedings of the 28th International Parallel and Distributed Processing Symposium Workshops, Phoenix, 2014. 228--235. Google Scholar

[19] Dai G H, Huang T H, Chi Y Z, et al. GraphH: a processing-in-memory architecture for large-scale graph processing. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2018. doi: 10.1109/TCAD.2018.2821565. Google Scholar

[20] Kyrola A, Blelloch G, Guestrin C. Graphchi: large-scale graph computation on just a pc. In: Proceedings of the 10th Usenix Symposium on Operating Systems Design and Implementation, Hollywood, 2012. 31--46. Google Scholar

[21] Zhang M X, Zhuo Y W, Wang C, et al. GraphP: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of the 24th IEEE International Symposium on High-Performance Computer Architecture, Vienna, 2018. 544--557. Google Scholar

[22] Umuroglu Y, Morrison D, Jahre M. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In: Proceedings of the 25th International Conference on Field Programmable Logic and Applications, London, 2015. 1--8. Google Scholar

[23] Zhu X W, Han W T, Chen W G. GridGraph: large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: Proceedings of USENIX Annual Technical Conference, Santa Clara, 2015. 375--386. Google Scholar

[24] Nodehi S A H, Qiu J Q, Zhao Z J. Tigr: transforming irregular graphs for GPU-friendly graph processing. In: Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, Virginia, 2018. 622--636. Google Scholar

[25] Zhong J L, He B S. Medusa: Simplified Graph Processing on GPUs. IEEE Trans Parallel Distrib Syst, 2014, 25: 1543-1552 CrossRef Google Scholar

[26] Fu Z S, Personick M, Thompson B. MapGraph: a high level API for fast development of high performance graph analytics on GPUs. In: Proceedings of Workshop on GRAph Data management Experiences and Systems, Snowbird, 2014. 1--6. Google Scholar

[27] Shi X H, Liang J L, Di S, et al. Optimization of asynchronous graph processing on GPU with hybrid coloring model. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francicso, 2015. 271--272. Google Scholar

[28] Khorasani F, Vora K, Gupta R, et al. CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, Vancouver, 2014. 239--252. Google Scholar

[29] Kim M S, An K, Park H, et al. GTS: a fast and scalable graph processing method based on streaming topology to GPUs. In: Proceedings of the 2016 International Conference on Management of Data, San Francisco, 2016. 447--461. Google Scholar

[30] Seo H, Kim J, Kim M S. Gstream: a graph streaming processing method for large-scale graphs on gpus. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francicso, 2015. 253--254. Google Scholar

[31] Sengupta D, Song S L, Agarwal K, et al. GraphReduce: processing large-scale graphs on accelerator-based systems. In: Proceedings of the 27th International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, 2015. 1--12. Google Scholar

[32] Wang Y Z, Davidson A, Pan Y C, et al. Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Barcelona, 2016. 265--266. Google Scholar

[33] Nai L F, Hadidi R, Sim J, et al. GraphPIM: Enabling instruction-level PIM offloading in graph computing frameworks. In: Proceedings of the 23rd IEEE Symposium on High Performance Computer Architecture, Austin, 2017. 457--468. Google Scholar

[34] Gao M Y, Ayers G, Kozyrakis C. Practical near-data processing for in-memory analytics frameworks. In: Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques, San Francisco, 2015. 113--124. Google Scholar

[35] Nai L, Xia Y, Tanase I G, et al. GraphBIG: understanding graph computing in the context of industrial solutions. In: Proceedings of the 27th International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, 2015. 1--12. Google Scholar

[36] Han L, Shen Z, Liu D. A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal. ACM Trans Storage, 2018, 14: 1-26 CrossRef Google Scholar

[37] Ahn J, Hong S, Yoo S, et al. A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd International Symposium on Computer Architecture, Portland, 2015. 105--117. Google Scholar

[38] Zhang J L, Li J. Degree-aware hybrid graph traversal on FPGA-HMC platform. In: Proceedings of the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2018. 229--238. Google Scholar

[39] Betkaoui B, Thomas D B, Luk W, et al. A framework for FPGA acceleration of large graph problems: graphlet counting case study. In: Proceedings of International Conference on Field-Programmable Technology, Delhi, 2011. 1--8. Google Scholar

[40] Khoram S, Zhang J, Strange M, et al. Accelerating graph analytics by co-optimizing storage and access on an FPGA-HMC platform. In: Proceedings of the 26th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2018. 239--248. Google Scholar

[41] Wang Q B, Jiang W R, Xia Y L, et al. A messagepassing multi-softcore architecture on FPGA for breadth-first search. In: Proceedings of International Conference on Field-Programmable Technology, Beijing, 2010. 70--77. Google Scholar

[42] Windh S, Budhkar P, Najjar W A. CAMs as synchronizing caches for multithreaded irregular applications on FPGAs. In: Proceedings of the 34th IEEE/ACM International Conference on Computer-Aided Design, Austin, 2015. 331--336. Google Scholar

[43] Wang L, Yang X J, Dai H D. Scratchpad memory allocation for arrays in permutation graphs. Sci China Inf Sci, 2013, 56: 1-13 CrossRef Google Scholar

[44] Zhou J, Liu S, Guo Q, et al. Tunao: a high-performance and energy-efficient reconfigurable accelerator for graph processing. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing IEEE, Madrid, 2017. 731--734. Google Scholar

[45] Huang T H, Dai G H, Wang Y, et al. HyVE: hybrid vertex-edge memory hierarchy for energy-efficient graph processing. In: Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, Dresden, 2018. 973--978. Google Scholar

[46] Shao Y, Cui B, Ma L. PAGE: A Partition Aware Engine for Parallel Graph Computation. IEEE Trans Knowl Data Eng, 2015, 27: 518-530 CrossRef Google Scholar

[47] Ben-Nun T, Sutton M, Pai S, et al. Groute: an asynchronous multi-GPU programming model for irregular computations, In: Proceedings of the 23th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Austin, 2017. 235--248. Google Scholar

[48] Shi X, Cui B, Shao Y, et al. Tornado: a system for real-time iterative analysis over evolving data. In: Proceedings of the 2016 International Conference on Management of Data, San Francisco, 2016. 417--430. Google Scholar

[49] Pan P, Li C. Congra: towards efficient processing of concurrent graph queries on shared-memory machines. In: Proceedings of the 35th IEEE International Conference on Computer Design, Boston, 2017. 217--224. Google Scholar

[50] Jin H, Yao P C, Liao X F, et al. Towards dataflow-based graph accelerator. In: Proceedings of the 37th International Conference on Distributed Computing Systems, Atlanta, 2017. 1981--1992. Google Scholar

[51] Yao P C, Zheng L, Liao X F, et al. An efficient graph accelerator with parallel data conflict management. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, Limassol, 2018. 1--12. Google Scholar

[52] Xue J, Yang Z, Qu Z, et al. Seraph: an efficient, low-cost system for concurrent graph processing. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, Vancouver, 2014. 227--238. Google Scholar

[53] Xie C N, Chen R, Guan H B, et al. Sync or async: time to fuse for distributed graph-parallel computation. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francisco, 2015. 194--204. Google Scholar

[54] Zhang K Y, Chen R, Chen H B. NUMA-aware graph-structured analytics. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francisco, 2015. 183--193. Google Scholar

[55] Zhang M X,Wu Y W, Chen K, et al. Exploring the hidden dimension in graph processing. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, 2016. 285--300. Google Scholar

[56] Sha M, Li Y, He B. Accelerating dynamic graph analytics on GPUs. Proc VLDB Endow, 2017, 11: 107-120 CrossRef Google Scholar

[57] Chen H Y, Sun Z G, Yi F. BufferBank storage: an economic, scalable and universally usable in-network storage model for streaming data applications. Sci China Inf Sci, 2016, 59: 012103 CrossRef Google Scholar

  • Figure 1

    Traversal-centric and computation-centric graph algorithms

  • Figure 2

    Graph topology and edge array representation. (a) Graph topology; (b) edge array

  • Figure 3

    Adjacent list representation. (a) Adjacent list(incoming edge); (b) compressed sparse column; (c) adjacent list(outgoing edge); (d) compressed sparse row

  • Table 1   Summary of graph processing systems on memory optimization
    Optimization scheme Graph processing systems [system,~architecture,~year]
    Data layout [GraphOps~\upcite{graphops},~FPGA,~2016], [CyGraph~\upcite{cygraph},~FPGA,~2014], [GraphH~\upcite{graphh},~PIM,~2018], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [FPGP~\upcite{fpgp},~FPGA,~2017], [Tigr~\upcite{tigr},~GPU,~2018], [X-Stream~\upcite{x-stream},~CPU,~2013], [Mosaic~\upcite{masaic},~CPU,~2017], etc.
    Graph partition [GraphChi~\upcite{graphchi},~CPU,~2012], [Cusha~\upcite{cusha},~GPU,~2014], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [GraphH\upcite{graphh},~PIM,~2018], [FPGP~\upcite{fpgp},~FPGA,~2017], [GraphP~\upcite{graphp},~PIM,~2018], [GStream~\upcite{gstream},~GPU,~2015], [GTS~\upcite{gts},~GPU,~2016], [GridGraph~\upcite{gridgraph},~CPU,~2015], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [Gemini~\upcite{gemini},~CPU,~2016], [Frog~\upcite{frog},~GPU,~2015], [Page~\upcite{page},~CPU,~2015], [GraphReduce~\upcite{graphreduce},~GPU,~2015], etc.
    Storage media [GraphPIM~\upcite{graphpim},~PIM,~2017], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [FPGP~\upcite{fpgp},~FPGA,~2017], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [FPGA-HMC~\upcite{zhangj-1},~FPGA,~2018], [Ozdal~\upcite{isca16},~ASIC,~2016], [GraphH~\upcite{graphh},~PIM,~2018], [GraphR~\upcite{graphr},~PIM,~2018], [GraphP~\upcite{graphp},~PIM,~2018], [Tesseract~\upcite{pim-isca15},~PIM,~2015], etc.
    Memory access [REBFS~\upcite{rebfs},~PIM,~2018], [Tesseract~\upcite{pim-isca15},~PIM,~2015], [GraphH\upcite{graphh},~PIM,~2018], [Graphicionado\upcite{graphicionado},~ASIC,~2016], [GraphGen~\upcite{graphgen},~FPGA,~2014], [FPGA-HMC~\upcite{zhangj-2},~FPGA,~2018], [GTS~\upcite{gts},~GPU,~2016], [Frog~\upcite{frog},~GPU,~2015], etc.
    Communication [Tesseract~\upcite{pim-isca15},~PIM,~2015], [GraphP~\upcite{graphp},~PIM,~2018], [GraphH~\upcite{graphh},~PIM,~2018], [Groute~\upcite{grout},~GPU,~2017], [FPGP~\upcite{fpgp},~FPGA,~2017], [ForeGraph~\upcite{foregraph},~FPGA,~2017], [Frog~\upcite{frog},~GPU,~2015], [Tornado~\upcite{tornado},~CPU,~2016], [GTS~\upcite{gts},~GPU,~2016], etc.
    Power [GraphBIG\upcite{graphbig},~PIM,~2015], [GraphH\upcite{graphh},~PIM,~2018], [Graphicionado~\upcite{graphicionado},~ASIC,~2016], [GraphP\upcite{graphp},~PIM,~2018], [GraphR~\upcite{graphr},~PIM,~2018], [GraphPIM~\upcite{graphpim},~PIM,~2017], [HyVE~\upcite{hyve},~CPU,~2018], [Tesseract~\upcite{pim-isca15},~PIM,~2015], [Ozdal~\upcite{isca16},~CPU,~2016], [Congra~\upcite{congraph},~CPU,~2017], [Tunao~\upcite{tunao},~ASIC,~2017], etc.