logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 9 : 1303(2020) https://doi.org/10.1360/SSI-2020-0099

Key issues in exascale computing

More info
  • ReceivedApr 21, 2020
  • AcceptedJul 31, 2020
  • PublishedSep 23, 2020

Abstract


Funded by

国家科技重点研发计划(2016YFB0200100)

国家自然科学基金(61732002)


References

[1] “Yinhe" 100 Mega-scale super computer system is sucessfully developed. Comput Eng Sci, 1984, 1: 137. Google Scholar

[2] Si H W, Feng L S. The development of the first supercomputer YH-1 in China and its inspiration. Studies History Natural Sci, 2017, 36: 563--580. Google Scholar

[3] High performance vector computer. CCF China Computer History. https://www.ccf.org.cn/c/2018-09-12/652327.shtml. Google Scholar

[4] Li G J, Chen H A, Fan J P, et al. Dawning-1 parallel computer. Chin J Comput, 1994, 17: 882--889. Google Scholar

[5] Sun N H, Liu H, Liu W Z, et al. The design of system software of dawning-1000 massively parallel processing system. Chin J Comput, 1997, 20: 259--268. Google Scholar

[6] Sun N, Meng D. Dawning4000A high performance computer. Front Comput Sc China, 2007, 1: 20-25 CrossRef Google Scholar

[7] Zhu M, Xiao L, Ruan L. DeepComp: towards a balanced system design for high performance computer systems. Front Comput Sci China, 2010, 4: 475-479 CrossRef Google Scholar

[8] Yu Y, Zhang Y Q, Wang T, et al. Early Performance Evaluation of Dawning 5000A and DeepComp 7000. In: Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, Shenzhen, 2009. 578--585. Google Scholar

[9] Yang X, Liao X, Xu W. TH-1: China's first petaflop supercomputer. Front Comput Sci China, 2010, 4: 445-455 CrossRef Google Scholar

[10] Li Q, Li B, Huo Z. Design and implementation of communication system of the Dawning 6000 supercomputer. Front Comput Sci China, 2010, 4: 466-474 CrossRef Google Scholar

[11] Yang X J, Liao X K, Lu K. The TianHe-1A Supercomputer: Its Hardware and Software. J Comput Sci Technol, 2011, 26: 344-351 CrossRef Google Scholar

[12] Calamia J. China's homegrown supercomputers. IEEE Spectrum, 2012, 49: 60-62. Google Scholar

[13] Niu X, Wang Z, Pan Z. Extreme Learning Machine-Based Deep Model for Human Activity Recognition With Wearable Sensors. Comput Sci Eng, 2019, 21: 16-25 CrossRef ADS Google Scholar

[14] Moore G E. Cramming more components onto integrated circuits, Electronics, 1965, 38: 114--117. Google Scholar

[15] Dennard R H, Gaensslen F H, Yu H N. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE J Solid-State Circuits, 1974, 9: 256-268 CrossRef ADS Google Scholar

[16] Amdahl G M. Validity of the single-processor approach to achieving large-scale computing capabilities. In: Proceedings of the AFIPS '67 Spring Joint Computer Conference, Atlantic City, 1967. 483--485. Google Scholar

[17] Gustafson J L. Reevaluating Amdahl's law. Commun ACM, 1988, 31: 532-533 CrossRef Google Scholar

[18] Wulf W A, McKee S A. Hitting the memory wall: implications of the obvious. SIGARCH Comput Architect News, 1995, 23: 20--24. Google Scholar

[19] Horowitz M. Computing's energy problem. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), 2014. 57: 10--14. Google Scholar

[20] Vazhkudai S S, de Supinski B R, Bland A S, et al. The design, deployment, and evaluation of the CORAL pre-exascale systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18), Dallas, 2018. 52: 1--12. Google Scholar

[21] Fu H, Liao J, Yang J. The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci, 2016, 59: 072001 CrossRef Google Scholar

[22] Qian D P. China's effort on exascale computing: current status and perspectives. In: Proceedings of International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18), Dallas, 2018. Google Scholar

[23] Cugola G, Margara A. Processing flows of information. ACM Comput Surv, 2012, 44: 1-62 CrossRef Google Scholar

[24] Becker T, Burovskiy P, et al. From exaflop to exaflow. In: Proceedings of the Conference on Design, Automation & Test in Europe Conference & Exhibition (DATE'17), Lausanne, 2017. 404--409. Google Scholar

[25] Kaplan K R, Winder R O. Cache-based Computer Systems. IEEE Comput, 1973, 6: 30--36. Google Scholar

[26] Liptay J S. 1968. Structural aspects of the system/360 model 85: II the cache. IBM Syst J, 1968, 7: 15--21. Google Scholar

[27] Power J, Basu A, Gu J L, et al. Heterogeneous system coherence for integrated CPU-GPU systems. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, 2013. 457--467. Google Scholar

[28] Martin M K, Hill M D, Wood D A. Token Coherence: decoupling performance and correctness. In: Proceedings of the 30th International Symposium on Computer Architecture (ISCA'03), San Diego, 2003. 182--193. Google Scholar

[29] Wang H, Wang R, Luan Z Z. Improving multiprocessor performance with fine-grain coherence bypass. Sci China Inf Sci, 2015, 58: 1-15 CrossRef Google Scholar

[30] Iyer S S, Kalter H L. Embedded DRAM technology: opportunities and challenges. IEEE Spectr, 1999, 36: 56-64 CrossRef Google Scholar

[31] Iyer S S, Barth J E, Parries P C. Embedded DRAM: Technology platform for the Blue Gene/L chip. IBM J Res Dev, 2005, 49: 333-350 CrossRef Google Scholar

[32] Ghose S, Hsieh K, Boroumand A, et al. Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions. 2018,. arXiv Google Scholar

[33] Zhou M, Prodromou A, Wang R. Temperature-Aware DRAM Cache Management -Relaxing Thermal Constraints in 3D Systems. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2019, : 1-1 CrossRef Google Scholar

[34] Wolf S A, Lu J, Stan M R. The Promise of Nanomagnetics and Spintronics for Future Logic and Universal Memory. Proc IEEE, 2010, 98: 2155-2168 CrossRef Google Scholar

[35] Hennessy J L, Patterson D A. A new golden age for computer architecture. Commun ACM, 2019, 62: 48-60 CrossRef Google Scholar

[36] Chien A. Technology Scaling and the Future of Microprocessors: The 10x10 Approach. 2012. http://i2pc.cs.illinois.edu/seminars.html. Google Scholar

[37] Chang L, Frank D J, Montoye R K. Practical Strategies for Power-Efficient Computing Technologies. Proc IEEE, 2010, 98: 215-236 CrossRef Google Scholar

[38] Dreslinski R G, Wieckowski M, Blaauw D. Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits. Proc IEEE, 2010, 98: 253-266 CrossRef Google Scholar

[39] Ghasemi H R, Sinkar A, Schulte M, et al. Cost-effective power delivery to support per-core voltage domains for power-constrained processors. In: Proceedings of the 49th Annual Design Automation Conference (DAC'12), San Francisco, 2012. 56--61. Google Scholar

[40] Ansari A, Mishra A, Xu J, et al. Tangle: route-oriented dynamic voltage minimization for variation-afflicted, energy-efficient on-chip networks. In: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture (HPCA'14), Orlando, 2014. 440--451. Google Scholar

[41] Torrellas J. Extreme-Scale Computer Architecture. National Science Review, 2016, 3 (1). Google Scholar

[42] Kogge P, Borkar S, Campbell D, et al. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. DARPA-IPTO Sponsored Study, 2008. Google Scholar

[43] Feautrier P. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. Int J Parallel Prog, 1992, 21: 313-347 CrossRef Google Scholar

[44] Smith B. Architecture and Applications of the HEP Multiprocessor Computer System. In: Proceedings of SPIE, 1982. 241--248. Google Scholar

[45] Chen T S, Du Z D, Sun N H, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'14), Salt Lake City, 2014. 269--284. Google Scholar

[46] Chen Y, Chen T, Xu Z. DianNao family. Commun ACM, 2016, 59: 105-112 CrossRef Google Scholar

[47] Liu S L, Du Z D, et al. Cambricon: An Instruction Set Architecture for Neural Networks. In: Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture(ISCA 2016), Seoul, 2016. 393--405. Google Scholar

[48] Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture(ISCA'17), Toronto, 2017. 1--12. Google Scholar

[49] Merolla P A, Arthur J V, Alvarez-Icaza R. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 2014, 345: 668-673 CrossRef ADS Google Scholar

[50] Davies M, Srinivasa N, Lin T H. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro, 2018, 38: 82-99 CrossRef Google Scholar

[51] Imam N, Cleland T A. Rapid online learning and robust recall in a neuromorphic olfactory circuit. Nat Mach Intell, 2020, 2: 181-191 CrossRef Google Scholar

[52] Nai L F, Hadidi R, Sim J, et al. GraphPIM: enabling instruction-level PIM offloading in graph computing frameworks. In: Proceedings of the IEEE International Symposium on High Performance Computer Architecture(HPCA'17), Austin, 2017. 457--468. Google Scholar

[53] Ahn J W, Hong S P, Yoo S, et al. A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15), Portland, 2015. 105--117. Google Scholar

[54] Zhuo Y W, Wang C, Zhang M X, et al. GraphQ: scalable PIM-based graph processing. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture(MICRO'19), Columbus, 2019. 712--725. Google Scholar

[55] Ham T J, Wu L, Sundaram N, et al. Graphicionado: a high-performance and energyecient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, 2016. 1--13. Google Scholar

[56] Ophir N, Mineo C, Mountain D. Silicon Photonic Microring Links for High-Bandwidth-Density, Low-Power Chip I/O. IEEE Micro, 2013, 33: 54-67 CrossRef Google Scholar

[57] Kurian G, Sun C, Chen C H O, et al. Cross-layer energy and performance evaluation of a nanophotonic manycore processor system using real application workloads. In: Proceedings of the 26th International Parallel and Distributed Processing Symposium (IPDPS'12), Shanghai, 2012. 1117--1130. Google Scholar

[58] Thakkar I G, Chittamuru S V R, Pasricha S. Run-time laser power management in photonic NoCs with on-chip semiconductor optical amplifiers. Proceedings of the 10th IEEE/ACM International Symposium on Networks-on-Chip (NoCS'16), Nara, 2016. 1--4. Google Scholar

[59] Haurylau M, Chen G, Chen H. On-Chip Optical Interconnect Roadmap: Challenges and Critical Directions. IEEE J Sel Top Quantum Electron, 2006, 12: 1699-1705 CrossRef ADS Google Scholar

[60] Anders M A. High-performance energy-efficient NoC fabrics: evolution and future challenges. In: Proceedings of the 8th IEEE/ACM International Symposium on Networks-on-Chip (NoCS'14), Ferrara, 2014. Google Scholar

[61] Werner S, Navaridas J, Luján M. Efficient Sharing of Optical Resources in Low-Power Optical Networks-on-Chip. J Opt Commun Netw, 2017, 9: 364-374 CrossRef Google Scholar

[62] Li H, Fourmigue A, Le Beux S. Towards Maximum Energy Efficiency in Nanophotonic Interconnects with Thermal-Aware On-Chip Laser Tuning. IEEE Trans Emerg Top Comput, 2018, 6: 343-356 CrossRef Google Scholar

[63] Ramini L, Grani P, et al. Contrasting wavelength-routed optical NoC topologies for power-efficient 3D-stacked multicore processors using physical-layer analysis. In: Proceedings of the Conference on Design, Automation and Test in Europe, Grenoble, 2013. 1589--1594. Google Scholar

[64] Cao R, Wang K, Gu H. A crosstalk-aware wavelength assignment method for optical network-on-chip. IEICE Electron Express, 2016, 13: 20160821 CrossRef Google Scholar

[65] Werner S, Navaridas J, Lujan M. Amon: an advanced mesh-like optical NoC. In: Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects(HOTI'15), Santa Clara, 2015. 52--59. Google Scholar

[66] Vantrease D, Schreiber R, Monchiero M, et al. Corona: system implications of emerging nanophotonic technology. In: Proceedings of the 35th International Symposium on Computer Architecture (ISCA'08), Beijing, 2008. 153--164. Google Scholar

[67] Pan Y, Kim J, Memik G. Flexishare: Channel sharing for an energy-efficient nanophotonic crossbar. In: Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA'10), Bangalore, India, 2010. 1--12. Google Scholar

[68] Xu Y, Yang J, Melhem R. Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitration. In: Proceedings of the International Conference on Supercomputing(ICS'12), Venice, 2012. 133--142. Google Scholar

[69] Wu X, Xu J, Ye Y. SUOR. J Emerg Technol Comput Syst, 2014, 10: 1-25 CrossRef Google Scholar

[70] Kirman N, Kirman M, Dokania R K, et al. Leveraging optical technology in future bus-based chip multiprocessors. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, 2006. 492--503. Google Scholar

[71] Pan Y, Kumar P, Kim J, et al. Firefly: illuminating future network-on-chip with nanophotonics. In: Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09), Austin, 2009. 429--440. Google Scholar

[72] Werner S, Navaridas J, Luján M. A Survey on Optical Network-on-Chip Architectures. ACM Comput Surv, 2018, 50: 1-37 CrossRef Google Scholar

[73] Gerofi B, Takagi M, et al. On the scalability, performance isolation and device driver transparency of the IHK/McKernel hybrid lightweight kernel. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'16), Chicago, 2016. 1041--1050. Google Scholar

[74] Zhang L, Liu Y, Wang R. Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput, 2014, 69: 547-560 CrossRef Google Scholar

[75] Reed D A, Dongarra J. Exascale computing and big data. Commun ACM, 2015, 58: 56-68 CrossRef Google Scholar

[76] National Supercomputer Center in Guangzhou. Tianhe Star cloud supercomputing platform. http://en.nscc-gz.cn/Product/HighPerformanceComputingService/ServiceCharacteristics.html. Google Scholar

[77] Kulkarni M, Pingali K, Walter B, et al. Optimistic parallelism requires abstractions. In: Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, 2007. 211--222. Google Scholar

[78] Kulkarni M, Pingali K, Ramanarayanan G, et al. Optimistic parallelism benefits from data partitioning. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'08), Seattle, 2008. 233--243. Google Scholar

[79] Kulkarni M, Burtscher M, Inkulu R, et al. How much parallelism is there in irregular applications? In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'09), Raleigh, 2009. 3--14. Google Scholar

[80] Bauer M, Clark J, Schkufza E, et al. Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia. In: Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'11), San Antonio, 2011. 13--24. Google Scholar

[81] Gao L, Wang R, Qian D P. J Software, 2013, 24: 1390-1402 CrossRef Google Scholar

[82] Xu Y L, Wang R, Goswami N, et al. Software transactional memory for GPU architectures. In: Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'14), Orlando, 2014. 1--10. Google Scholar

[83] Qian X H, Torrellas J, Sahelices B, et al. BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, 2013. 371--382. Google Scholar

[84] Qian X H, Sahelices B, Torrellas J, et al. Volition: precise and scalable sequential consistency violation detection. In: Proceedings of the 18th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'13), Houston, 2013. 535--548. Google Scholar

[85] Qian X H, Huang H, Sahelices B, et al. Rainbow: efficient memory dependence recording with high replay parallelism for relaxed memory model. In: Proceedings of the 19th IEEE International Symposium on High-Performance Computer Architecture (HPCA'13), Shenzhen, 2013. 554--565. Google Scholar

[86] Qian X H, Sahelices B, Qian D P. Pacifier: record and replay for relaxed-consistency multiprocessors with distributed directory protocol. In: Proceedings of the 41st ACM/IEEE International Symposium on Computer Architecture(ISCA'14), Minneapolis, 2014. 433--444. Google Scholar

[87] Mo Z, Zhang A, Cao X. JASMIN: a parallel software infrastructure for scientific computing. Front Comput Sci China, 2010, 4: 480-488 CrossRef Google Scholar

[88] Liu Q K, Zhao W B, Cheng J, et al. A programming framework for large scale numerical simulations on unstructured mesh. In: Proceedings of the IEEE International Conference on High Performance and Smart Computing (HPSC'16), New York, 2016. 310--315. Google Scholar

[89] Liu Q, Mo Z, Zhang A. JAUMIN: a programming framework for large-scale numerical simulation on unstructured meshes. CCF Trans HPC, 2019, 1: 35-48 CrossRef Google Scholar

[90] Wang W, Wang S Y, Jiang J R, et al. Implementation and optimization of fast multipole method on Sunway manycore processors. Computer Engineering & Science, 2019, 41: 1161--1167. Google Scholar

[91] Zou K, Zhang Z, Zhang J. 3D model retrieval scheme based on fuzzy clustering for physical descriptors. J Algorithms Comput Tech, 2016, 10: 12-22 CrossRef Google Scholar

[92] Yu T Y, Zhao Y H, Zhao L. Optimize a preconditioned block iterative eigensolver on sunway MAC. J Numerical Methods Comput Appl, 2019, 40: 291--309. Google Scholar

[93] Jiang Y, Li S, Xu Y. A Higher-Order Polynomial Method for SPECT Reconstruction. IEEE Trans Med Imag, 2019, 38: 1271-1283 CrossRef Google Scholar

[94] Wu K, Tang H. On physical-constraints-preserving schemes for special relativistic magnetohydrodynamics with a general equation of state. Z Angew Math Phys, 2018, 69: 84 CrossRef ADS arXiv Google Scholar

[95] Tang T, Wang L L, Yuan H. Rational Spectral Methods for PDEs Involving Fractional Laplacian in Unbounded Domains. SIAM J Sci Comput, 2020, 42: A585-A611 CrossRef Google Scholar

[96] Sugon X86 supercomputer prototype: liquid cooling, peak performance. https://www.cnbeta.com/articles/tech/865797.htm. Google Scholar

[97] Boito F Z, Inacio E C, Bez J L. A Checkpoint of Research on Parallel I/O for High-Performance Computing. ACM Comput Surv, 2018, 51: 1-35 CrossRef Google Scholar