logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 6 : 160403(2021) https://doi.org/10.1007/s11432-021-3234-0

Energy-efficient computing-in-memory architecture for AI processor: device, circuit, architecture perspective

More info
  • ReceivedJan 8, 2021
  • AcceptedApr 5, 2021
  • PublishedMay 11, 2021

Abstract


Acknowledgment

This work was supported by National Key RD Program of China (Grant No. 2019YFB2204500) and UESTC Research Start-up Funding (Grant No. Y030202059018052).


References

[1] Liu L, Qu Z, Deng L, et al. Duet: boosting deep neural network efficiency on dual-module architecture. In: Proceedings of 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020. 738--750. Google Scholar

[2] Chen Y, Luo T, Liu S, et al. Dadiannao: a machine-learning supercomputer. In: Proceedings of 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014. 609--622. Google Scholar

[3] Du Z, Fasthuber R, Chen T, et al. Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), 2015. 92--104. Google Scholar

[4] Pham P, Jelaca D, Farabet C, et al. Neuflow: dataflow vision processing system-on-a-chip. In: Proceedings of 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS), 2012. 1044--1047. Google Scholar

[5] Chen Y, Krishna T, Emer J, et al. 14.5 eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: Proceedings of 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016. 262--263. Google Scholar

[6] N.Jouppi,C. Young, N. Patil et al In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News, 2017, 45(2) doi: 10.1145/3079856.3080246. Google Scholar

[7] Li W, Xu P, Zhao Y, et al. Timely: pushing data movements and interfaces in PIM accelerators towards local and in time domain. In: Proceedings of 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020. 832--845. Google Scholar

[8] Chi P, Li S, Xu C, et al. Prime: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In: Proceedings of 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. 27--39. Google Scholar

[9] Zhao Y, Chen X, Wang Y, et al. Smartexchange: trading higher-cost memory storage/access for lower-cost computation. In: Proceedings of 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020. 954--967. Google Scholar

[10] Gokhale M, Holmes B, Iobst K. Processing in memory: the terasys massively parallel pim array. Computer, 1995, 28: 23--31. Google Scholar

[11] Patterson D, Anderson T, Cardwell N. A case for intelligent RAM. IEEE Micro, 1997, 17: 34-44 CrossRef Google Scholar

[12] Hall M, Kogge P, Koller J, et al. Mapping irregular applications to diva, a PIM-based data-intensive architecture. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, 1999. 57. Google Scholar

[13] Oskin M, Chong F T, Sherwood T. Active pages: a computation model for intelligent memory. In: Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998. 192--203. Google Scholar

[14] Kang Y, Huang W, Yoo S M, et al. FlexRAM: toward an advanced intelligent memory system. In: Proceedings of IEEE International Conference on Computer Design, 1999. 192--201. Google Scholar

[15] Patterson D, Anderson T, Cardwell N, et al. Intelligent RAM (IRAM): chips that remember and compute. In: Proceedings of 1997 IEEE International Solids-State Circuits Conference, 1997. 224--225. Google Scholar

[16] Li S, Xu C, Zou Q, et al. Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In: Proceedings of 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC), 2016. 1--6. Google Scholar

[17] Zhuo Y W, Wang C, Zhang M X, et al. Graphq: scalable PIM-based graph processing. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. New York: Association for Computing Machinery, 2019. Google Scholar

[18] Deng L, Wang G, Li G. Tianjic: A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation. IEEE J Solid-State Circuits, 2020, 55: 2228-2246 CrossRef ADS Google Scholar

[19] Li S, Niu D, Malladi K T, et al. Drisa: a dram-based reconfigurable in-situ accelerator. In: Proceedings of 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017. 288--301. Google Scholar

[20] Li S, Glova A O, Hu X, et al. Scope: a stochastic computing engine for DRAM-based in-situ accelerator. In: Proceedings of 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018. 696--709. Google Scholar

[21] Ahn J, Hong S, Yoo S, et al. A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), 2015. 105--117. Google Scholar

[22] Chang L, Ma X, Wang Z, et al. CORN: in-buffer computing for binary neural network. In: Proceedings of 2019 Design, Automation Test in Europe Conference Exhibition (DATE), 2019. 384--389. Google Scholar

[23] Chang L, Ma X, Wang Z. PXNOR-BNN: In/With Spin-Orbit Torque MRAM Preset-XNOR Operation-Based Binary Neural Networks. IEEE Trans VLSI Syst, 2019, 27: 2668-2679 CrossRef Google Scholar

[24] Gao M, Ayers G, Kozyrakis C. Practical near-data processing for in-memory analytics frameworks. In: Proceedings of 2015 International Conference on Parallel Architecture and Compilation (PACT), 2015. 113--124. Google Scholar

[25] Peng X, Liu R, Yu S. Optimizing Weight Mapping and Data Flow for Convolutional Neural Networks on Processing-in-Memory Architectures. IEEE Trans Circuits Syst I, 2020, 67: 1333-1343 CrossRef Google Scholar

[26] Chen Y, Emer J, and Sze V. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. 367--379. Google Scholar

[27] Fleischer B, Shukla S, Ziegler M, et al. A scalable multi- teraops deep learning processor core for AI trainina and inference. In: Proceedings of 2018 IEEE Symposium on VLSI Circuits, 2018. 35--36. Google Scholar

[28] Samal K, Wolf M, Mukhopadhyay S. Attention-Based Activation Pruning to Reduce Data Movement in Real-Time AI: A Case-Study on Local Motion Planning in Autonomous Vehicles. IEEE J Emerg Sel Top Circuits Syst, 2020, 10: 306-319 CrossRef ADS Google Scholar

[29] Yin S, Ouyang P, Liu L. A Fast and Power-Efficient Memory-Centric Architecture for Affine Computation. IEEE Trans Circuits Syst II, 2016, 63: 668-672 CrossRef Google Scholar

[30] Standard J. High bandwidth memory (HBM) DRAM. 2013. Google Scholar

[31] Consortium H M C. Hybrid memory cube specification 1.0. 2013. Google Scholar

[32] Koo G, Matam K K, Te I et al. Summarizer: trading communication with computing near storage. In: Proceedings of 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017. 219--231. Google Scholar

[33] Nair R, Antao S F, Bertolli C. Active Memory Cube: A processing-in-memory architecture for exascale systems. IBM J Res Dev, 2015, 59: 17:1-17:14 CrossRef Google Scholar

[34] Farmahini-Farahani A, Ahn J H, Morrow K, et al. NDA: near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In: Proceedings of 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 2015. 283--295. Google Scholar

[35] Si X, Liu R, Yu S. A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors. IEEE J Solid-State Circuits, 2020, 55: 189-202 CrossRef ADS Google Scholar

[36] Zhang M, Zhuo Y, Wang C, et al. Graphp: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018. 544--557. Google Scholar

[37] Dai G, Huang T, Chi Y. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2019, 38: 640-653 CrossRef Google Scholar

[38] Okumura S, Yabuuchi M, Hijioka K, et al. A ternary based bit scalable, 8.80 TOPS/W CNN accelerator with many-core processing-in-memory architecture with 896K synapses/mm2. In: Proceedings of 2019 Symposium on VLSI Technology, 2019. Google Scholar

[39] Biswas A and Chandrakasan A P. CONV-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In: Proceedings of 2018 IEEE International Solid-State Circuits Conference, 2018. 488--490. Google Scholar

[40] Kang M, Gonugondla S K, and Shanbhag N R. A 19.4 nJ/decision 364K decisions/s in-memory random forest classifier in 6T SRAM array. In: Proceedings of the 43rd IEEE European Solid State Circuits Conference, 2017. 263--266. Google Scholar

[41] Valavi H, Ramadge P J, Nestler E, et al. A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement. In: Proceedings of 2018 IEEE Symposium on VLSI Circuits, 2018. 141--142. Google Scholar

[42] Gonugondla S K, Kang M, Shanbhag N. A 42PJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training. In: Proceedings of 2018 IEEE International Solid-State Circuits Conference, 2018. 490--492. Google Scholar

[43] Ramanathan A K, Kalsi G S, Srinivasa S, et al. Look-up table based energy efficient processing in cache support for neural network acceleration. In: Proceedings of 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020. 88--101. Google Scholar

[44] Eckert C, Wang X, Wang J, et al. Neural cache: bit-serial in-cache acceleration of deep neural networks. In: Proceedings of 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), 2018. 383--396. Google Scholar

[45] Sayal A, Fathima S, Nibhanupudi S S T, et al. 14.4 all-digital time-domain CNN engine using bidirectional memory delay lines for energy-efficient edge computing. In: Proceedings of 2019 IEEE International Solid-State Circuits Conference, 2019. 228--230. Google Scholar

[46] Sayal A, Nibhanupudi S S T, Fathima S. A 12.08-TOPS/W All-Digital Time-Domain CNN Engine Using Bi-Directional Memory Delay Lines for Energy Efficient Edge Computing. IEEE J Solid-State Circuits, 2020, 55: 60-75 CrossRef ADS Google Scholar

[47] Everson L R, Liu M, Pande N, et al. A 104.8TOPS/W one-shot time-based neuromorphic chip employing dynamic threshold error correction in 65 nm. In: Proceedings of 2018 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2018. 273--276. Google Scholar

[48] Everson L R, Liu M, Pande N. An Energy-Efficient One-Shot Time-Based Neural Network Accelerator Employing Dynamic Threshold Error Correction in 65 nm. IEEE J Solid-State Circuits, 2019, 54: 2777-2785 CrossRef ADS Google Scholar

[49] Amravati A, Nasir S B, Thangadurai S, et al. A 55 nm time-domain mixed-signal neuromorphic accelerator with stochastic synapses and embedded reinforcement learning for autonomous micro-robots. In: Proceedings of 2018 IEEE International Solid-State Circuits Conference, 2018. 124--126. Google Scholar

[50] Amaravati A, Nasir S B, Ting J. A 55-nm, 1.0-0.4V, 1.25-pJ/MAC Time-Domain Mixed-Signal Neuromorphic Accelerator With Stochastic Synapses for Reinforcement Learning in Autonomous Mobile Robots. IEEE J Solid-State Circuits, 2019, 54: 75-87 CrossRef ADS Google Scholar

[51] Chen Z, Gu J. High-Throughput Dynamic Time Warping Accelerator for Time-Series Classification With Pipelined Mixed-Signal Time-Domain Computing. IEEE J Solid-State Circuits, 2021, 56: 624-635 CrossRef ADS Google Scholar

[52] Wan W, Kubendran R, Eryilmaz S B, et al. 33.1 a 74 TMACS/W CMOS-RRAM neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In: Proceedings of 2020 IEEE International Solid-State Circuits Conference, 2020. 498--500. Google Scholar

[53] Khwa W, Chang M, Wu J, et al. 7.3 a resistance-drift compensation scheme to reduce MLC PCM raw BER by over 100$\times$ for storage-class memory applications. In: Proceedings of 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016. 134--135. Google Scholar

[54] Wang Z, Zhou H, Wang M. Proposal of Toggle Spin Torques Magnetic RAM for Ultrafast Computing. IEEE Electron Device Lett, 2019, 40: 726-729 CrossRef ADS Google Scholar

[55] Chang L, Ma X, Wang Z. DASM: Data-Streaming-Based Computing in Nonvolatile Memory Architecture for Embedded System. IEEE Trans VLSI Syst, 2019, 27: 2046-2059 CrossRef Google Scholar

[56] Chang T, Chiu Y, Lee C, et al. 13.4 a 22 nm 1 Mb 1024b-read and near-memory-computing dual-mode stt-mram macro with 42.6 GB/s read bandwidth for security-aware mobile devices. In: Proceedings of 2020 IEEE International Solid-State Circuits Conference, 2020. 224--226. Google Scholar

[57] Zhang S, Huang K, Shen H. A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations. IEEE Trans Circuits Syst I, 2020, 67: 1867-1880 CrossRef Google Scholar

[58] Yu Z, Wang Z, Kang J. Early-Stage Fluctuation in Low-Power Analog Resistive Memory: Impacts on Neural Network and Mitigation Approach. IEEE Electron Device Lett, 2020, 41: 940-943 CrossRef ADS Google Scholar

[59] Yang J, Zhu J, Dang B, et al. TaOx synapse array based on ion profile engineering for high accuracy neuromorpic computing. In: Proceedings of 2020 China Semiconductor Technology International Conference (CSTIC), 2020. 1--4. Google Scholar

[60] Wang Z, Kang J, Bai G. Self-Selective Resistive Device With Hybrid Switching Mode for Passive Crossbar Memory Application. IEEE Electron Device Lett, 2020, 41: 1009-1012 CrossRef ADS Google Scholar

[61] Chang L, Wang Z, Zhang Y. Multi-Port 1R1W Transpose Magnetic Random Access Memory by Hierarchical Bit-Line Switching. IEEE Access, 2019, 7: 110463 CrossRef Google Scholar

[62] Zhang J, Wang Z, Verma N. In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array. IEEE J Solid-State Circuits, 2017, 52: 915-924 CrossRef ADS Google Scholar

[63] Khwa W, Chen J, Li J, et al. A 65 nm 4 kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors. In: Proceedings of 2018 IEEE International Solid-State Circuits Conference, 2018. 496--498. Google Scholar

[64] Su J, Si X, Chou Y, et al. 15.2 a 28 nm 64 Kb inference-training two-way transpose multibit 6T SRAM compute-in-memory macro for AI edge chips. In: Proceedings of 2020 IEEE International Solid-State Circuits Conference, 2020. 240--242. Google Scholar

[65] Dong Q, Sinangil M E, Erbagci B, et al. 15.3 a 351TOPS/W and 372.4GOPS compute-in-memory sram macro in 7 nm finfet CMOS for machine-learning applications. In: Proceedings of 2020 IEEE International Solid-State Circuits Conference, 2020. 242--244. Google Scholar

[66] X. Si and Y. Tu and W. Huang et al 15.5 a 28 nm 64 Kb 6T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips. In: Proceedings of 2020 IEEE International Solid-State Circuits Conference, 2020. 246--248. Google Scholar

[67] Yue J, Yuan Z, Feng X, et al. 14.3 a 65 nm computing-in-memory-based cnn processor with 2.9-to-35.8 TOPS/W system energy efficiency using dynamic-sparsity performance-scaling architecture and energy-efficient inter/intra-macro data reuse. In: Proceedings of 2020 IEEE International Solid-State Circuits Conference, 2020. 234--236. Google Scholar

[68] Wang J, Wang X, Eckert C, et al. 14.2 a compute sram with bit-serial integer/floating-point operations for programmable in-memory vector acceleration. In: Proceedings of 2019 IEEE International Solid-State Circuits Conference, 2019. 224--226. Google Scholar

[69] Gonugondla S K, Kang M, and Shanbhag N. A 42PJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training. In: Proceedings of 2018 IEEE International Solid-State Circuits Conference, 2018. 490--492. Google Scholar

[70] Chiu Y C, Zhang Z, Chen J J. A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors. IEEE J Solid-State Circuits, 2020, 55: 2790-2801 CrossRef ADS Google Scholar

[71] Wang J, Wang X, Eckert C. A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing. IEEE J Solid-State Circuits, 2020, 55: 76-86 CrossRef ADS Google Scholar

[72] Jia H, Valavi H, Tang Y. A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing. IEEE J Solid-State Circuits, 2020, 55: 2609-2621 CrossRef ADS arXiv Google Scholar

[73] Jiang Z, Yin S, Seo J S. C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism. IEEE J Solid-State Circuits, 2020, 55: 1888-1897 CrossRef ADS Google Scholar

[74] Yin S, Jiang Z, Seo J, et al. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks. IEEE Journal of Solid-State Circuits, 2020, 55(6):1733--1743 doi: 10.1109/VLSIT.2018.8510687. Google Scholar

[75] Biswas A, Chandrakasan A P. CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks. IEEE J Solid-State Circuits, 2019, 54: 217-230 CrossRef ADS Google Scholar

[76] Kang M, Gonugondla S K, Patil A. A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array. IEEE J Solid-State Circuits, 2018, 53: 642-655 CrossRef ADS Google Scholar

[77] Yang J, Kong Y, Wang Z, et al. 24.4 sandwich-RAM: an energy-efficient in-memory BWN architecture with pulse-width modulation. In: Proceedings of 2019 IEEE International Solid-State Circuits Conference, 2019. 394--396. Google Scholar

[78] Chih Y D, Lee P H, Fujiwara H, et al. An 89 TOPS/W and 16.3TOPS/mm2 all-digital sram-based full-precision compute-in memory macro in 22 nm for machine-learning edge applications. In: Proceedings of 2021 IEEE International Solid-State Circuits Conference (ISSCC), 2021. 252--254. Google Scholar

[79] Chen W, Li K, Lin W, et al. A 65 nm 1 Mb nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processors. In: Proceedings of 2018 IEEE International Solid-State Circuits Conference, 2018. 494--496. Google Scholar

[80] Xue C, Chen W, Liu J, et al. 24.1 a 1 Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN based AI edge processors. In: Proceedings of 2019 IEEE International Solid- State Circuits Conference, 2019. 388--390. Google Scholar

[81] Yan B, Yang Q, Chen W, et al. RRAM-based spiking nonvolatile computing-in-memory processing engine with precision-configurable in situ nonlinear activation. In: Proceedings of 2019 Symposium on VLSI Technology, 2019. 86--87. Google Scholar

[82] Su F, Chen W, Xia L, et al. A 462GOPS/J RRAM-based nonvolatile intelligent processor for energy harvesting IOE system featuring nonvolatile logics and processing-in-memory. In: Proceedings of 2017 Symposium on VLSI Technology, 2017. 260--261. Google Scholar

[83] Liu Q, Gao B, Yao P, et al. 33.2 a fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing. In: Proceedings of 2020 IEEE International Solid- State Circuits Conference, 2020. 500--502. Google Scholar

[84] Xue C X, Chang T W, Chang T C. Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors. IEEE J Solid-State Circuits, 2020, 55: 203-215 CrossRef ADS Google Scholar

[85] Zha Y, Nowak E, Li J. Liquid Silicon: A Nonvolatile Fully Programmable Processing-in-Memory Processor With Monolithically Integrated ReRAM. IEEE J Solid-State Circuits, 2020, 55: 908-919 CrossRef ADS Google Scholar

[86] Wan W, Kubendran R, Gao B, et al. A voltage-mode sensing scheme with differential-row weight mapping for energy-efficient rram-based in-memory computing. In: Proceedings of 2020 IEEE Symposium on VLSI Technology, 2020. 1--2. Google Scholar

[87] Shibuta Y, Sakane S, Miyoshi E. Heterogeneity in homogeneous nucleation from billion-atom molecular dynamics simulation of solidification of pure metal. Nat Commun, 2017, 8: 10 CrossRef PubMed ADS Google Scholar

[88] Dai Q, Liu Z, Huang L. Thin-film composite membrane breaking the trade-off between conductivity and selectivity for a flow battery. Nat Commun, 2020, 11: 13 CrossRef PubMed ADS Google Scholar

[89] Lee K R, Kim J, Kim C. IEEE Solid-State Circuits Lett, 2020, 3: 390-393 CrossRef Google Scholar

[90] A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory. IEEE J Solid-State Circuits, 2016, 51: 1009-1021 CrossRef ADS Google Scholar

[91] Ando K, Ueyoshi K, Orimo K, et al. Brein memory: a 13-layer 4.2 k neuron/0.8 m synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS. In: Proceedings of 2017 Symposium on VLSI Circuits, 2017. 24--25. Google Scholar

[92] Slesazeck S, Ravsher T, Havel V, et al. A 2TNC ferroelectric memory gain cell suitable for compute-in-memory and neuromorphic application. In: Proceedings of 2019 IEEE International Electron Devices Meeting (IEDM), 2019. 1--4. Google Scholar

[93] Yu C, Yoo T, Kim H. A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks. IEEE Trans Circuits Syst I, 2021, 68: 667-679 CrossRef Google Scholar

  • Figure 1

    (Color online) The computation for the convolution layer of CNNs. (a) Simple seven nested loops of convolution layer; protectłinebreak (b) traditional AI accelerator architecture with buffers and PE array.

  • Figure 2

    The CIM architecture in the 1990s and current. (a) The CIM idea of the 1990s, with compute unit nearby the memory; (b) the recent CIM architecture by integrating computing and memory.

  • Figure 3

    (Color online) An example of AI-driven architecture with PE array, scratchpad memory for both AI training and inference [29].

  • Figure 4

    (Color online) The architecture of near memory computing based on 3D stacking technology.

  • Figure 5

    (Color online) The modified memory cell based SRAM CIM architecture [37].

  • Figure 6

    (Color online) The look-up-table based CIM architecture [47].

  • Figure 7

    (Color online) The possible time-domain based CIM theory.

  • Figure 8

    (Color online) The emerging NVM device contains (a) resistive RAM, (b) phase-change RAM, (c) spin-transfer torque MRAM, and (d) spin-orbit torque MRAM [59].

  • Figure 9

    (Color online) The CIM architecture with NVM device. (a) Transpose memory array; (b) CIM architecture supports bit-wise operations.

  • Table 1  

    Table 1Summarization of modern CIM test demonstrations

    Publication Technology (cell) Target (Operation) Capacity Method Precision Accuracy (%)
    2017[40] SRAM 130 nm (6T)SVM (-)$^{\rm~a)}$ 16 kb MixedI(5b), W(1b), O(1b) MNIST: 90
    2018[42] SRAM 65 nm (10T) CNN (F) 4 kb Mixed I(7b), W(1b), O(4b) MNIST: $96$
    2018[66] SRAM 65 nm (6T) DNN (F) 4 kb Mixed I(1b), W(1b), O(1b) MNIST: $97.5$
    2019[37]SRAM 65 nm (T8T)CNN (F) 3.8 kb Mixed I(4b), W(5b), O(7b)$^{\rm~b)}$ MNIST: $99.52$
    2020[67]SRAM 28 nm (6T)CNN (F/B) 64 kb Mixed I(8b), W(8b), O(20b)$^{\rm~b)}$ CIFAR: $91.94$
    2020[68] SRAM 7 nm (8T)CNN(F) 4 kb MixedI(4b), W(4b), O(4b) MNIST: $98.50$
    2020[69] SRAM 28 nm (6T)CNN(F) 64 kb MixedI(8b), W(8b), O(20b)$^{\rm~b)}$ CIFAR10: $92.02$
    2020[70] SRAM 65 nm (8T)CNN (F) 4 kb MixedI(4b), W(8b), O(20b)$^{\rm~c)}$ RsNet: $92.88$
    2019[71] SRAM 28 nm (8T)DSP(-) 128 kb MixedI(-), W(-), O(-)
    2018[72] SRAM 65 nm (8T)SVM(F) 16 kb MixedI(8b), W(8b), O(-)$^{\rm~c)}$ MIT-CBCL: $96$
    2020[73] SRAM 55 nm (6T)CNN(F) 4 kb MixedI(8b), W(8b), O(19b)$^{\rm~b)}$ CIFAR10: 91.93
    2019[74] SRAM 28 nm (8T)arithmetic (-) 16 kb MixedI(-), W(-), O(-)$^{\rm~b)}$
    2020[75] SRAM 65 nm (8T)CNN(F) 72 kb MixedI(8b), W(4b), O(-) MNIST: 92.40
    2020[76] SRAM 65 nm (8T1C)CNN(F)72 kb MixedI(1b), W(1b), O(5b) MNIST: 98.30
    2020[77] SRAM 65 nm (12T)DNN(F) 16 kb MixedI(1b), W(3b), O(-) MNIST: 98.84
    2019[78] SRAM 65 nm (10T)CNN(F) 16 kb MixedI(6b), W(1b), O(6b) MNIST: 98.30
    2019[45] SRAM 65 nm (6T)SVM(-)$^{\rm~a)}$ 16 kb MixedI(8b), W(8b), O(-) MIT: 95
    2019[79]SRAM 28 nm (8T)CNN (F) 2 kbTime domainI(8b), W(1b), O(8b)
    2021[80]SRAM 28 nm (10T)MAC (F) 2.6 Mb All digitalI(4 or 8b), W(4–16b), O(all)
    2018[81]RRAM 65 nm (1T1R)CNN/FCN (F)$^{\rm~d)}$ 1 Mb NVM I(1b), W(3b), O(3b) MNIST: $98$
    2019[82] RRAM 55 nm (1T1R)CNN(F) 1 Mb NVMI(2b), W(3b), O(3b) MNIST: $98.80$
    2020[83] RRAM 150 nm (1T1R)CNN(F) 64 kb NVMI(8b), W(3b), O(-) CIFAR10:$98.90$
    2020[84] RRAM 150 nm (1T1R)CNN(F) NVMI(-), W(-), O(-)
    2020[85] RRAM 130 nm (2T2R)MAC(F) 158.8 kb NVMI(1b), W(3b), O(8b) MNIST: $94.40$
    2020[56] RRAM 130 nm (1T1R)RNN/CNN/MLP (F)$^{\rm~c)}$ 500 kb NVM I(1b),W(-),O(1b) MNIST: $97.55$
    2020[86] RRAM 55 nm( 1T1R)CNN(F) 1 Mb NVMI(2b), W(3b), O(4b) CIFAR10: $88.52$
    2020[87] RRAM 130 nm (1T1R)CNN/BNN (F) 16 kb NVMI(-), W(-), O(-)
    2020[88] RRAM 130 nm (1T1R)MVM(F) 64 kb NVMI(2b), W(3b), O(-) MINIST: $91.38$
    2017[89]PCRAM-(1T1R)Stat.(-)3 Mb NVMStatistic: $93$
    2020[90]PCRAM-(1T1R)DNN (F) 256 kbNVMI(8b), W(8b), O(8b) CIFAR10: $93.7$
    2020[60]STT-MRAM (1T1R) SHA (-)1 Mb NVM
    2020[91]STT-MRAM (1T1R)CNN (F) 8 Mb NVMI(8b), W(8b), O(8b) MIT-BIH: $85.1$

    a

qqqq

Contact and support