logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 6 : 160407(2021) https://doi.org/10.1007/s11432-020-3245-7

NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators

More info
  • ReceivedDec 31, 2020
  • AcceptedApr 7, 2021
  • PublishedMay 10, 2021

Abstract


Acknowledgment

This work was supported by National Key Research and Development Project of China (Grant No. 2018YFB-1003304) and National Natural Science Foundation of China (Grant Nos. 61832020, 62032001).


References

[1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar

[2] Ren S, He K, Girshick R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Annual Conference on Neural Information Processing Systems 2015, Montreal, 2015. 91--99. Google Scholar

[3] Coates A, Huval B, Wang T, et al. Deep learning with COTS HPC systems. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013. 1337--1345. Google Scholar

[4] Zhang C, Li P, Sun G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, 2015. 161--170. Google Scholar

[5] Chen Y, Luo T, Liu S, et al. Dadiannao: a machine-learning supercomputer. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, 2014. 609--622. Google Scholar

[6] Shafiee A, Nag A, Muralimanohar N. ISAAC. SIGARCH Comput Archit News, 2016, 44: 14-26 CrossRef Google Scholar

[7] Chi P, Li S, Xu C. PRIME. SIGARCH Comput Archit News, 2016, 44: 27-39 CrossRef Google Scholar

[8] Zoph B and Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations, Toulon, 2017. Google Scholar

[9] Song L, Qian X, Li H, et al. Pipelayer: a pipelined RERAM-based accelerator for deep learning. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017. 541--552. Google Scholar

[10] Ji Y, Zhang Y, Xie X, et al. Fpsa: a full system stack solution for reconfigurable reram-based NN accelerator architecture. In: Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, 2019. 733--747. Google Scholar

[11] Tang S, Yin S, Zheng S, et al. Aepe: an area and power efficient RRAM crossbar-based accelerator for deep CNNs. In: Proceedings of IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA), 2017. 1--6. Google Scholar

[12] Liu X, Mao M, Liu B, et al. Reno: a high-efficient reconfigurable neuromorphic computing accelerator design. In: Proceedings of the 52nd Annual Design Automation Conference, 2015. 1--6. Google Scholar

[13] Zhu Z, Sun H, Lin Y, et al. A configurable multi-precision cnn computing framework based on single bit RRAM. In: Proceedings of 56th ACM/IEEE Design Automation Conference (DAC), 2019. 1--6. Google Scholar

[14] Zhu Z, Lin J, Cheng M, et al. Mixed size crossbar based RRAM CNN accelerator with overlapped mapping method. In: Proceedings of 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018. 1--8. Google Scholar

[15] Umesh S, Mittal S. A survey of spintronic architectures for processing-in-memory and neural networks. J Syst Architecture, 2019, 97: 349-372 CrossRef Google Scholar

[16] Mohamed K S. Near-memory/in-memory computing: pillars and ladders. In: Proceedings of Neuromorphic Computing and Beyond, 2020. 167--186. Google Scholar

[17] He Z, Lin J, Ewetz R, et al. Noise injection adaption: end-to-end RERAM crossbar non-ideal effect adaption for neural network mapping. In: Proceedings of the 56th Annual Design Automation Conference, Las Vegas, 2019. 57. Google Scholar

[18] Feinberg B, Wang S, Ipek E. Making memristive neural network accelerators reliable. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture, Vienna, 2018. 52--65. Google Scholar

[19] Puglisi F M, Larcher L, Padovani A. A Complete Statistical Investigation of RTN in HfO$_{2}$-Based RRAM in High Resistive State. IEEE Trans Electron Devices, 2015, 62: 2606-2613 CrossRef ADS Google Scholar

[20] Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8697--8710. Google Scholar

[21] Zhong Z, Yan J, Wu W, et al. Practical block-wise neural network architecture generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 2423--2432. Google Scholar

[22] Liu H, Simonyan K, Vinyals O, et al. Hierarchical representations for efficient architecture search. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018. Google Scholar

[23] Real E, Aggarwal A, Huang Y, et al. Regularized evolution for image classifier architecture search. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, 2019. 4780--4789. Google Scholar

[24] Tan M, Chen B, Pang R, et al. Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 2820--2828. Google Scholar

[25] Wang H, Wu Z, Liu Z, et al. HAT: hardware-aware transformers for efficient natural language processing. In: Proceedings of Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. 7675--7688. Google Scholar

[26] Wang T, Wang K, Cai H, et al. APQ: joint search for network architecture, pruning and quantization policy. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020. 2075--2084. Google Scholar

[27] Li W, Ning X, Ge G, et al. FTT-NAS: discovering fault-tolerant neural architecture. In: Proceedings of the 25th Asia and South Pacific Design Automation Conference, Beijing, 2020. 211--216. Google Scholar

[28] Hu K, Ding D, Tian S, et al. FTR-NAS: fault-tolerant recurrent neural architecture search. In: Proceedings of the 27th International Conference on Neural Information Processing, Bangkok, 2020. 589--597. Google Scholar

[29] Jiang W, Lou Q, Yan Z, et al. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators. 2019,. arXiv Google Scholar

[30] Xie L, Chen X, Bi K, et al. Weight-sharing neural architecture search: a battle to shrink the optimization gap. 2020,. arXiv Google Scholar

[31] Kandasamy K, Neiswanger W, Schneider J, et al. Neural architecture search with bayesian optimisation and optimal transport. In: Proceedings of Annual Conference on Neural Information Processing Systems, Montréal, 2018. 2020--2029. Google Scholar

[32] Cai Y, Tang T, Xia L. Low Bit-Width Convolutional Neural Network on RRAM. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2020, 39: 1414-1427 CrossRef Google Scholar

[33] Courbariaux M, Bengio Y, David J. Binaryconnect: training deep neural networks with binary weights during propagations. In: Proceedings of Annual Conference on Neural Information Processing Systems Montreal, 2015. 3123--3131. Google Scholar

[34] Qin H, Gong R, Liu X, et al. Forward and backward information retention for accurate binary neural networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020. 2247--2256. Google Scholar

[35] Loshchilov I, Hutter F. SGDR: stochastic gradient descent with warm restarts. In: Proceedings of the 5th International Conference on Learning Representations, Toulon, 2017. Google Scholar

[36] Krizhevsky A. Learning Multiple Layers of Features From Tiny Images. Technical Report. Toronto: University of Toronto, 2009. Google Scholar

  • Figure 1

    Demonstration of the RRAM crossbar.

  • Figure 2

    Demonstration of computation of matrix-vector product for the weight with negative value on RRAM crossbar.

  • Figure 3

    Overview of the NAS4RRAM framework.

  • Figure 4

    Comparing of (a) standard residual block and (b) the modified residual block.

  • Figure 5

    Demonstration of the networks in the search space.

  • Table 1  

    Table 1The results on CIFAR-10/CIFAR-100 for different networks$^{\rm~a)}$

    TaskNetwork#Weight ACC (%) Deployable ($B=16$) Deployable ($B=32$)Deployable ($B=48$)
    ResNet-20 $\times$1 267k 82.4 N N Y
    ResNet-20 $\times$0.5 71k 72.6 Y Y Y
    ResNet-32 $\times$1 461k 82.9 N N N
    CIFAR-10ResNet-32 $\times$0.5 122k 76.1 Y Y Y
    NAS4RRAM ($B=$16) 125k 78.5 Y Y Y
    NAS4RRAM ($B=$32) 261k 82.7 N Y Y
    NAS4RRAM ($B=$48) 383k 84.4 N N Y
    ResNet-20 $\times$1 267k 50.7 N N Y
    ResNet-20 $\times$0.5 71k 38.2 Y Y Y
    ResNet-32 $\times$1 461k 53.0 N N N
    CIFAR-100ResNet-32 $\times$0.5 122k 39.3 Y Y Y
    NAS4RRAM ($B=16$) 118k 45.6 Y Y Y
    NAS4RRAM ($B=32$) 250k 50.9 N Y Y
    NAS4RRAM ($B=48$) 343k 53.1 N N Y

    a) #Weight is the number of weights in thousand. ACC is the top-1 accuracy. We mark a Y at the corresponding column if the network is deployable on an accelerator with $B$ RRAM-crossbars.

qqqq

Contact and support