logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 11 : 1697(2020) https://doi.org/10.1360/SSI-2020-0092

The impact of data flow computing thinking on the development of computer architecture

More info
  • ReceivedApr 15, 2020
  • AcceptedMay 25, 2020
  • PublishedOct 21, 2020

Abstract


Funded by

国家重点研发计划(2018YFB1003400)


References

[1] Dennis J B. Programming generality, parallelism and computer architecture. In: Proceedings of International Federation for Information Processing Congress, Edinburgh, 1968. 484--492. Google Scholar

[2] Dennis J B. First version of a data flow procedure language. In: Proceedings of Symposium on Programming, Berlin, 1974. 362--376. Google Scholar

[3] Lu X D, Wen C L. Computer Architecture. Beijing: High Education Press, 2008. Google Scholar

[4] Li G J. A new kind of computer architecture-data flow computer. Comput Rev, 1981, 11: 1--8. Google Scholar

[5] Dennis J B, David P M. A preliminary architecture for a basic data-flow processor. In: Proceedings of the 2nd Annual Symposium on Computer Architecture, 1974. 126--132. Google Scholar

[6] Gurd J R, Kirkham C C, Watson I. The Manchester prototype dataflow computer. Commun ACM, 1985, 28: 34-52 CrossRef Google Scholar

[7] Arvind , Nikhil R S. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans Comput, 1990, 39: 300-318 CrossRef Google Scholar

[8] Papadopoulos G M, Culler D. Monsoon: an explicit token-store architecture. In: Proceedings of the International Symposium on Computer Architecture, Washington, 1990. 82--91. Google Scholar

[9] Gurd J R. The Manchester dataflow machine. Comput Phys Commun, 1985, 37: 49-62 CrossRef Google Scholar

[10] Dally W J, Chien A A, Fiske S, et al. The J-Machine: a fine grain concurrent computer. In: Proceedings of International Federation for Information Processing Congress, San Francisco, 1989. 1147--1153. Google Scholar

[11] Culler D, Arvind. Resource requirements of dataflow programs. In: Proceedings of the International Symposium on Computer Architecture, 1988. 141--150. Google Scholar

[12] Contrerasrojas B, Quianeruiz J, Kaoudi Z, et al. TagSniff: simplified big data debugging for dataflow jobs. In: Proceedings of the Symposium on Cloud Computing, Chaminade, 2019. 453--464. Google Scholar

[13] Smith J E, Pleszkun A R. Implementation of precise interrupts in pipelined processors. In: Proceedings of the International Symposium on Computer Architecture, Boston, 1985. 36--44. Google Scholar

[14] Johb L H, David A P. Computer Architecture: A Quantitative Approach. 2013, 1: 135-172. Google Scholar

[15] Marr D T, Binns F, Hill D L, et al. Hyper-Threading Technology Architecture and Microarchitecture. Intel Technology Journal, 2002,6(1): 1-12. Google Scholar

[16] Chen J, Watson-III W A. Multi-threading performance on commodity multi-core processors. In: Proceedings of the 9th International Conference on High Performance Computing in Asia Pacific Region, 2007. Google Scholar

[17] Tullsen D M, Eggers S J, Levy H M, et al. Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of The International Symposium on Computer Architecture, Santa Margherita Ligure, 1995. 392--403. Google Scholar

[18] Mattson P, Dally W J. A programming system for the imagine media processor. Dissertation for Ph.D. Degree. Stanford: Stanford University, 2002. Google Scholar

[19] Khailany B, Dally W J, Kapasi U J. Imagine: media processing with streams. IEEE Micro, 2001, 21: 35-46 CrossRef Google Scholar

[20] Taylor M, Psota J, Saraf A, et al. Evaluation of the raw microprocessor: an exposed-wire-delay architecture for ILP and streams. In: Proceedings of the International Symposium on Computer Architecture, Munich, 2004. 2--13. Google Scholar

[21] Dally W J, Labonte F, Das A, et al. Merrimac: supercomputing with streams. In: Proceedings of the Conference on High Performance Computing (supercomputing), Phoenix AZ, 2003. 35--35. Google Scholar

[22] Zhang C Y. Implementation of the MASA-I stream processor on FPGA. Computer Engineering and Science, 2008, 30: 114--118. Google Scholar

[23] Suettlerlein J, Zuckerman S, Gao G R, et al. An implementation of the codelet model. In: Proceedings of the International Conference on Parallel Processing, Lyon, 2013. 633--644. Google Scholar

[24] Theobald K B. Earth: an efficient architecture for running threads. Dissertation for Ph.D. Degree. Montreal: McGill University, 1999. Google Scholar

[25] Suetterlein J. Darts: a runtime based on the Codelet execution model. Dissertation for Master Degree. Newark: University of Delaware, 2014. Google Scholar

[26] Dennis J B, Gao G R, Meng X X. Experiments with the Fresh Breeze tree-based memory model. Comput Sci Res Dev, 2011, 26: 325-337 CrossRef Google Scholar

[27] Dennis J B. Compiling fresh breeze codelets. In: Proceedings of Programming Models and Applications on Multicores and Manycores, 2014. 51--60. Google Scholar

[28] Mattson T G, Cledat R, Cavé V, et al. The Open Community Runtime: a runtime system for extreme scale computing. In: Proceedings of IEEE High Performance Extreme Computing Conference, 2016. 1--7. Google Scholar

[29] Gao G R, Sterling T, Stevens R, et al. Parallex: a study of a new parallel computation model. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium, 2007. 1--6. Google Scholar

[30] Ye X, Tan X, Wu M. An efficient dataflow accelerator for scientific applications. Future Generation Comput Syst, 2020, 112: 580-588 CrossRef Google Scholar

[31] Gray K. Microsoft DirectX 9 programmable graphics pipeline. Microsoft Pr, 2003. Google Scholar

[32] Kessenich J, Baldwin D, Rost R. The OpenGL shading language. http://www.opengl.org/documentation/glsl. Google Scholar

[33] Mark W R, Glanville R S, Akeley K, et al. Cg: A System for Programming Graphics Hardware in a C-like Language. In: Proceedings of the ACM SIGGRAPH, New York, 2003. 896--907. Google Scholar

[34] Nvidia C. Nvidia cuda c programming guide. Nvidia Corporation, 2011. Google Scholar

[35] Fermi N. NVIDIA's next generation CUDA Compute Architecture, Fermi. 2009. Google Scholar

[36] Dean J, Ghemawat S. MapReduce. Commun ACM, 2008, 51: 107-113 CrossRef Google Scholar

[37] Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets. In: Proceedings of the IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, 2010. 10. Google Scholar

[38] Marcu O, Costan A, Antoniu G, et al. Spark versus flink: understanding performance in big data analytics frameworks. In: Proceedings of the International Conference on Cluster Computing, Taiwan, 2016. 433--442. Google Scholar

[39] David E C, Klaus E S, Thorsten V E. Two fundamental limits on dataflow multiprocessing. In: Proceedings of the Architecture and Compilation Techniques for Medium and Fine Grain Parallelism, Orlando, 1993. 153--164. Google Scholar

[40] Carbone P, Katsifodimos A, Ewen S, et al. Apache flink: stream and batch processing in a single engine. IEEE Data(base) Engineering Bulletin, 2015, 36(4): 28-33. Google Scholar

[41] Tyler A, Robert B, Craig C, et al. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In: Proceedings of the VLDB Endowment, Hawaii, 2015. 1792--1803. Google Scholar

[42] Li S, Gerver P, MacMillan J. Challenges and experiences in building an efficient apache beam runner for IBM streams. Proc VLDB Endow, 2018, 11: 1742-1754 CrossRef Google Scholar

[43] Krizhevsky A, Sutskever I, Hinton G E, et al. ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems, Nevada, 2012. 1097--1105. Google Scholar

[44] Kim Y. Convolutional neural networks for sentence classification. 2014,. arXiv Google Scholar

[45] Sainath T N, Mohamed A-r, Kingsbury B, et al. Deep convolutional neural networks for LVCSR. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, 2013. 8614--8618. Google Scholar

[46] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the Computer Vision and Pattern Recognition, Boston, 2015. 1--9. Google Scholar

[47] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the Computer Vision and Pattern Recognition, Ohio, 2014. 580--587. Google Scholar

[48] Razavian A S, Azizpour H, Sullivan J, et al. CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the Computer Vision and Pattern Recognition, Ohio, 2014. 512--519. Google Scholar

[49] Fan Y, Lu X, Li D, et al. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In:Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016. 445--450. Google Scholar

[50] Yu D, Deng L. Automatic Speech Recognition: A Deep Learning Approach. Berlin: Springer, 2016. Google Scholar

[51] Ravanelli M, Parcollet T, Bengio Y, et al. The Pytorch-kaldi speech recognition toolkit. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing, Brighton, 2019. 6465--6469. Google Scholar

[52] Zhang Y, Pezeshki M, Brakel P, et al. Towards end-to-end speech recognition with deep convolutional neural networks. 2017,. arXiv Google Scholar

[53] Lai S, Xu L, Liu K, et al. Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, Texas, 2015. 2267--2273. Google Scholar

[54] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the Computer Vision and Pattern Recognition, Ohio, 2014. Google Scholar

[55] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar

[56] Deng J, Berg A, Satheesh S, et al. ILSVRC-2012. 2012. http://www.image-net.org/challenges/LSVRC. Google Scholar

[57] Chang H S, Fu M C, Hu J, et al. Google Deep Mind's AlphaGo. OR/MS Today, 2016, 43(5): 24-29. Google Scholar

[58] Chen T, Du Z, Sun N. DianNao. SIGARCH Comput Archit News, 2014, 42: 269-284 CrossRef Google Scholar

[59] Chen Y, Chen T, Xu Z. DianNao family. Commun ACM, 2016, 59: 105-112 CrossRef Google Scholar

[60] Room C. Tensor Processing Unit. Image. 2016. Google Scholar

[61] Abadi M, Agarwal A, Barham P, et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. 2015,. arXiv Google Scholar

[62] Liu Z, Dou Y, Jiang J. Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks. ACM Trans Reconfigurable Technol Syst, 2017, 10: 1-23 CrossRef Google Scholar