logo

SCIENTIA SINICA Informationis, Volume 47 , Issue 3 : 310-325(2017) https://doi.org/10.1360/N112016-00146

Identifying superword level parallelism with directed graph reachability}{Identifying superword level parallelism with directed graph reachability

More info
  • ReceivedJun 9, 2016
  • AcceptedJul 19, 2016
  • PublishedJan 13, 2017

Abstract


Funded by

``核高基"国家科技重大专项(2009ZX01036-001-001-2)

数学工程与先进计算国家重点实验室开放课题(2013\linebreak A11)


References

[1] Kahle J A, Day M N, Hofstee H P, et al. Introduction to the cell multiprocessor. IBM J Res Dev, 2005, 49: 589-604 CrossRef Google Scholar

[2] Bachega L, Chatterjee S, Dockserz K A, et al. A high-performance SIMD floating point unit for blueGene/L: architecture, compilation and algorithm design. In: Proceedings of the 13rd International Conference on Parallel Architecture and Compilation Techniques. Washington: IEEE Computer Society, 2004. 85-96. Google Scholar

[3] Allen R, Kennedy K. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. San Francisco: Morgan Kaufmann Publishers Inc, 2001. Google Scholar

[4] Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. New York: ACM, 2000. 145-156. Google Scholar

[5] Padua D A, Wolfe M J. Advanced compiler optimizations for supercomputers. Commun ACM, 1986, 29: 1184-1201 CrossRef Google Scholar

[6] Bulic P, Gustin V. D-test: an extension to banerjee test for a fast dependence analysis in a multimedia vectorizing compiler. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium. Washington: IEEE Computer Society, 2004: 535-546. Google Scholar

[7] Liu J, Zhang Y, Jang O, et al. A compiler framework for extracting superword level parallelism. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2012. 347-358. Google Scholar

[8] Shin J. Compiler optimizations for architectures supporting superword-level parallelism. Dissertation for Ph.D. Degree. California: University of Southern California Los Angeles, 2005. Google Scholar

[9] Shin J, Chame J, Hall M. Compiler-controlled caching in superword register files for multimedia extension architectures. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Washington: IEEE Computer Society, 2002. 45-55. Google Scholar

[10] Shin J, Chame J, Hall M. Exploiting superword-level locality in multimedia extension architectures. J Instruction Level Parall, 2003, 5: 1-28. Google Scholar

[11] Shin J, Hall M, Chame J. Superword-level parallelism in the presence of control flow. In: Proceedings of the International Symposium on Code Generation and Optimization, 2005. 165-175. Google Scholar

[12] Karrenberg R, Hack S. Whole-function vectorization. In: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Washington: IEEE Computer Society, 2011. 141-150. Google Scholar

[13] Bik A, Girkar M, Grey P, et al. Automatic intra-register vectorization for the Intel architecture. Int J Parall Prog, 2002, 30: 65-98 CrossRef Google Scholar

[14] Tenllado C, Pinuel L, Prieto M, et al. Pack transposition: enhancing superword level parallelism exploitation. In: Proceedings of the International Conference Parallel Computing: Current & Future Issues of High-End Computing, Malaga, 2005. 33: 573-580. Google Scholar

[15] Tenllado C, Prieto L P M, Tirado F, et al. Improving superword level parallelism support in modern compilers. In: Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. New York: ACM, 2005. 303-308. Google Scholar

[16] Nuzman D, Rosen I, Zaks A. Auto-vectorization of interleaved data for SIMD. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2006. 132-143. Google Scholar

[17] Nuzman D, Zaks A. Outer-loop vectorization-revisited for short SIMD architectures. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Washington: IEEE Computer Society, 2008. 2-11. Google Scholar

[18] Scarborough R G, Kolsky H G. A vectorizing Fortran compiler. IBM J Res Dev, 1986, 30: 163-171 CrossRef Google Scholar

[19] Wu P, Eichenberger A E, Wang A, et al. An integrated SIMDization framework using virtual vectors. In: Proceedings of the 19th Annual International Conference on Supercomputing. New York: ACM, 2005. 169-178. Google Scholar