logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 6 : 160406(2021) https://doi.org/10.1007/s11432-020-3198-9

Array-level boosting method with spatial extended allocation to improve the accuracy of memristor based computing-in-memory chips

More info
  • ReceivedDec 31, 2020
  • AcceptedFeb 25, 2021
  • PublishedApr 21, 2021

Abstract


Acknowledgment

This work was supported in part by National Key RD Program of China (Grant No. 2019YFB2205103), National Natural Science Foundation of China (Grant Nos. 92064001, 61851404, 61874169), Beijing Municipal Science and Technology Project (Grant No. Z191100007519008), and Beijing Innovation Center for Future Chips (ICFC).


References

[1] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436-444 CrossRef ADS Google Scholar

[2] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 770--778. Google Scholar

[3] Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019. 4171--4186. Google Scholar

[4] Lee J, Kim C, Kang S, et al. UNPU: a 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, 2018. 218--220. Google Scholar

[5] Salahuddin S, Ni K, Datta S. The era of hyper-scaling in electronics. Nat Electron, 2018, 1: 442-450 CrossRef Google Scholar

[6] Ielmini D, Wong H S P. In-memory computing with resistive switching devices. Nat Electron, 2018, 1: 333-343 CrossRef Google Scholar

[7] Zidan M A, Strachan J P, Lu W D. The future of electronics based on memristive systems. Nat Electron, 2018, 1: 22-29 CrossRef Google Scholar

[8] Zhang W, Gao B, Tang J. Neuro-inspired computing chips. Nat Electron, 2020, 3: 371-382 CrossRef Google Scholar

[9] Biswas A, Chandrakasan A P. Conv-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, 2018. 488--490. Google Scholar

[10] Si X, Chen J J, Tu Y N, et al. A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, 2019. 396--397. Google Scholar

[11] Lu J, Young S, Arel I. A 1 TOPS/W Analog Deep Machine-Learning Engine With Floating-Gate Storage in 0.13 μm CMOS. IEEE J Solid-State Circuits, 2015, 50: 270-281 CrossRef ADS Google Scholar

[12] Chen W H, Li K X, Lin W Y, et al. A 65 nm 1 Mb nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processors. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, 2018. 494--496. Google Scholar

[13] Mochida R, Kouno K, Hayata Y, et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In: Proceedings of IEEE Symposium on VLSI Technology, Honolulu, 2018. 175--176. Google Scholar

[14] Nandakumar S R, Le Gallo M, Boybat I, et al. Mixed-precision architecture based on computational memory for training deep neural networks. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Florence, 2018. Google Scholar

[15] Kim S, Ishii M, Lewis S, et al. NVM neuromorphic core with 64k-cell (256-by-256) phase change memory synaptic array with on-chip neuron circuits for continuous in-situ learning. In: Proceedings of IEEE International Electron Devices Meeting (IEDM), Washington, 2015. Google Scholar

[16] Sun X Y, Wang P N, Ni K, et al. Exploiting hybrid precision for training and inference: a 2T-1FeFET based analog synaptic weight cell. In: Proceedings of IEEE International Electron Devices Meeting (IEDM), 2018. Google Scholar

[17] Prezioso M, Merrikh-Bayat F, Hoskins B D. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature, 2015, 521: 61-64 CrossRef ADS arXiv Google Scholar

[18] Ambrogio S, Narayanan P, Tsai H. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature, 2018, 558: 60-67 CrossRef ADS Google Scholar

[19] Yao P, Wu H, Gao B. Face classification using electronic synapses. Nat Commun, 2017, 8: 15199 CrossRef ADS Google Scholar

[20] Li C, Wang Z, Rao M. Long short-term memory networks in memristor crossbar arrays. Nat Mach Intell, 2019, 1: 49-57 CrossRef Google Scholar

[21] Joshi V, Le Gallo M, Haefeli S. Accurate deep neural network inference using computational phase-change memory. Nat Commun, 2020, 11: 2473 CrossRef ADS arXiv Google Scholar

[22] Liu B Y, Li H, Chen Y R, et al. Vortex: variation-aware training for memristor X-bar. In: Proceedings of the 52nd ACM/EDAC/IEEE Design Automation Conference, San Francisco, 2015. Google Scholar

[23] Yao P, Wu H, Gao B. Fully hardware-implemented memristor convolutional neural network. Nature, 2020, 577: 641-646 CrossRef ADS Google Scholar

[24] Gonugondla S K, Kang M, Shanbhag N R. A Variation-Tolerant In-Memory Machine Learning Classifier via On-Chip Training. IEEE J Solid-State Circuits, 2018, 53: 3163-3173 CrossRef ADS Google Scholar

[25] Boybat I, Le Gallo M, Nandakumar S R. Neuromorphic computing with multi-memristive synapses. Nat Commun, 2018, 9: 2514 CrossRef ADS arXiv Google Scholar

[26] Joksas D, Freitas P, Chai Z. Committee machines-a universal method to deal with non-idealities in memristor-based neural networks. Nat Commun, 2020, 11: 4273 CrossRef ADS arXiv Google Scholar

[27] Wu W, Wu H Q, Gao B, et al. A methodology to improve linearity of analog RRAM for neuromorphic computing. In: Proceedings of IEEE Symposium on VLSI Technology, Honolulu, 2018. 103--104. Google Scholar

[28] Kull L, Toifl T, Schmatz M, et al. A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS. In: Proceedings of IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, 2013. 468--469. Google Scholar

[29] Shafiee A, Nag A, Muralimanohar N, et al. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: Proceedings of the 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, 2016. 14--26. Google Scholar

[30] Zhang W Q, Peng X C, Wu H Q, et al. Design guidelines of RRAM based neural-processing-unit: a joint device-circuit-algorithm analysis. In: Proceedings of the 56th Annual Design Automation Conference, Las Vegas, 2019. Google Scholar

  • Figure 1

    (Color online) Array-level boosting method. (a) The architecture of ResNet-34. (b) The principles of convolutional operation with the memristor array. The convolutional kernel is flattened from ($C_{\rm~out},C_{\rm~in},K,K$) to ($C_{\rm~out},C_{\rm~in}\times~K~\times~K$) and mapped to the memristor array with differential rows. (c) The schematic of the array-level boosting method with spatial extended allocation.

  • Figure 2

    (Color online) Experimental characteristics of fabricated memristor array. (a) Structure of device and 32$\times$128 array; (b) and (c) current voltage relation and read noise of conductance when the device is programmed to eight different conductance levels; (d) the cumulative distribution and corresponding standard deviation of eight different conductance levels.

  • Figure 3

    (Color online) Array-level boosting of data representation. (a) The standard deviation of eight conductance levels under the different $N_s$; (b) the distribution of programmed images with different $N_s$.

  • Figure 4

    (Color online) Array-level boosting for the image processing application. (a) The origin discrete cosine transformation matrix; (b) the read currents of 32$\times$128 array after mapping and programming of origin discrete cosine transformation matrix; (c) the programming error matrix of (b); (d) and (e) the trajectories of related average programming error and discrete cosine transformation output root-mean-square error when the $N_s$ is changed; (f) the transition of images after discrete cosine transformation and reverse discrete cosine transformation when using different $N_s$.

  • Figure 5

    (Color online) The results of the array-level boosting method with greedy spatial extended allocation on ResNet-34.protect łinebreak (a) and (b) The simulated accuracy loss under different standard deviations of reading and writing noise, and weight precision; (c) the optimized accuracy and (d) allocated arrays when the accuracy threshold and standard deviation of noise are different; (e) the diagram of chip-in-loop emulation; (f) comparison of classification accuracy among origin software, simulation and chip-in-loop emulation.

  • Figure 6

    (Color online) Estimation of overhead with array-level boosting allocation. (a) Area usage and its breakdown; (b) power consumption and its breakdown.

  •   

    Algorithm 1 Greedy spatial extended allocation method

    Require:Neural network, dataset, accuracy threshold $A_{\rm~th}$, maximum spatial extended allocation $N_s^m$.

    Output:Spatial extended allocation of each layer: $N_s^l$.

    Calculate the accuracy of neural work without noise: ${\rm~Acc}$;

    Initialize $N_s^l,~l\in(1,L)$;

    for $l~=~1$ to $L$

    while $|{\rm~Acc}_b-{\rm~Acc}|>A_{\rm~th}$ do

    $N_s^l=N_s^l+1$;

    if $N_s^l>N_s^m$ then

    $N_s^l=N_s^m$;

    end if

    Calculate the accuracy with $N_s$: ${\rm~Acc}_b$;

    end while

    end for

    return $N_s^l,~l\in(1,L)$.

qqqq

Contact and support