上海市浦江人才(16PJ1407600)
中国博士后科学基金(2017M610230)
国家自然科学基金重点项目(61332009)
国家自然科学基金面上项目(61775139)
上海市自然科学基金(15ZR1428600)
计算机体系结构国家重点实验室开放课题(CARCH201807)
[1] Zhang D Z. Research and implementation of a simulation system for PCM/DRAM-based hybrid memory. Dissertationfor Master Degree. Hefei: University of Science and Technology of China, 2017 [张德志. 基于PCM和DRAM的混合主存仿真系统研究与实现. 硕士学位论文. 合肥: 中国科学技术大学, 2017]. Google Scholar
[2] Lefurgy C, Rajamani K, Rawson F. Energy management for commercial servers. Computer, 2003, 36: 39-48 CrossRef Google Scholar
[3] Wu Y, Fu Y J, Chen W W, et al. Efficient mechanism of hybrid memory placement and erasure code. Comput Sci, 2017, 44: 57--62. Google Scholar
[4] Mao W, Liu J N, Tong W, et al. A review of storage technology research based on phase change memory. Chinese J Comput, 2015, 38: 944--960. Google Scholar
[5] Mittal S, Vetter J S, Li D. A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches. IEEE Trans Parallel Distrib Syst, 2015, 26: 1524-1537 CrossRef Google Scholar
[6] Li Y, Chen Y R, Jones A K. A software approach for combating asymmetries of non-volatile memories. In: Proceedings of ACM/IEEE International Symposium on Low Power Electronics and Design, 2012. 191--196. Google Scholar
[7] Shu J W, Lu Y Y, Zhang J C, et al. Research progress on non-volatile memory based storage system. Sci Technol Rev, 2016, 34: 86--94. Google Scholar
[8] Jin P Q. Big data storage management based on new storage. Big Data Res, 2017, 3: 70--82. Google Scholar
[9] Liu T. Parallel program scheduling for hybrid memory computing. Dissertation for Master Degree. Wuhan: Huazhong University of Science and Technology, 2015 [刘涛. 异构内存环境下并行程序调度优化系统. 硕士学位论文. 武汉: 华中科技大学, 2015]. Google Scholar
[10] Zhang J B. An energy management for hybrid memory based on write frequency of pages. Dissertation for Master Degree. Wuhan: Huazhong University of Science and Technology, 2015. Google Scholar
[11] Khouzani H A, Yang C M, Hu J T. Improving performance and lifetime of DRAM-PCM hybrid main memory through a proactive page allocation strategy. In: Proceedings of the 20th Asia and South Pacific Conference and Design Automation Conference (ASP-DAC), 2015. 508--513. Google Scholar
[12] Dhiman G, Ayoub R, Rosing T. PDRAM: a hybrid PRAM and DRAM main memory system. In: Proceedings of the 46th Annual Design Automation Conference, 2009. 664--469. Google Scholar
[13] Yoon H B, Meza J, Ausavarungnirun R, et al. Row buffer locality aware caching policies for hybrid memories. In: Proceedings of the 30th International Conference on Computer Design (ICCD), 2012. 337--344. Google Scholar
[14] Liu H K, Chen Y J, Liao X F, et al. Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures. In: Proceedings of the International Conference on Supercomputing, 2017. Google Scholar
[15] Ramos L E, Gorbatov E, Bianchini R. Page placement in hybrid memory systems. In: Proceedings of the International Conference on Supercomputing, 2011. 85--95. Google Scholar
[16] Park K H, Park S K, Hwang W, et al. Resource management of manycores with a hierarchical and a hybrid main memory for MN-mate cloud node. In: Proceedings of the 8th World Congress on Services (SERVICES), 2012. 301--308. Google Scholar
[17] Seok H, Park Y, Park K H. Migration based page caching algorithm for a hybrid main memory of DRAM and PRAM. In: Proceedings of ACM Symposium on Applied Computing, 2011. 595--599. Google Scholar
[18] Pagh R, Rodler F F. Cuckoo hashing. J Algorithms, 2004, 51: 122-144 CrossRef Google Scholar
[19] Mai H T, Park K H, Lee H S. Dynamic Data Migration in Hybrid Main Memories for In-Memory Big Data Storage. ETRI J, 2014, 36: 988-998 CrossRef Google Scholar
[20] Kim S, Hwang S H, Kwak J W. Adaptive-Classification CLOCK: Page replacement policy based on read/write access pattern for hybrid DRAM and PCM main memory. Microprocessors MicroSyst, 2018, 57: 65-75 CrossRef Google Scholar
[21] Zhang Z, Fu Y J, Hu G Y. DualStack: a high efficient dynamic page scheduling scheme in hybrid main memory. In: Proceedings of International Conference on Networking, Architecture, and Storage (NAS), 2017. Google Scholar
[22] Wu D H, He B S, Tang X Y, et al. RAMZzz: rank-aware DRAM power management with dynamic migrations and demotions. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012. Google Scholar
[23] Pei S W, Zhang J, Xiong N, et al. Performance-energy efficiency model of heterogeneous parallel multicore system. In: Proceedings of the 6th International Conference on Green Computing Conference and Sustainable Computing Conference (IGSC), 2015. Google Scholar
[24] Pei S W, Zhang J G, Jiang L H, et al. Evaluating the overhead of data preparation for heterogeneous multicore system. KSII Trans Int Inform Syst, 2016, 10: 3231--3244. Google Scholar
[25] Liu D, Zhang J B, Liao X F, et al. Simulator for hybrid memory architecture. J East China Norm Univ (Nat Sci), 2014, 5: 133--140. Google Scholar
[26] Zhou Y Y, Philbin J, Li K. The multi-queue replacement algorithm for second level buffer caches. In: Proceedings of the General Track: 2001 USENIX Annual Technical Conference, 2001. 91--104. Google Scholar
[27] Lee B C, Ipek E, Mutlu O, et al. Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Comput Architect News, 2009, 37: 2--13. Google Scholar
[28] Zuo P, Hua Y. A Write-Friendly and Cache-Optimized Hashing Scheme for Non-Volatile Memory Systems. IEEE Trans Parallel Distrib Syst, 2018, 29: 985-998 CrossRef Google Scholar
[29] Hassan A, Vandierendonck H, Nikolopoulos D S. Software-managed energy-efficient hybrid DRAM/NVM main memory. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015. Google Scholar
[30] Poremba M, Xie Y. NVMain: an architectural-level main memory simulator for emerging non-volatile memories. In: Proceedings of IEEE Computer Society Annual Symposium on VLSI, 2012. 392--397. Google Scholar
[31] Henning J L. SPEC CPU2006 benchmark descriptions. SIGARCH Comput Archit News, 2006, 34: 1-17 CrossRef Google Scholar
[32] Chen S, Gibbons P, Nath S, et al. Rethinking database algorithms for phase change memory. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR'11), 2011. 21--31. Google Scholar
[33] Zhao J S, Xie Y. Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration. In: Proceedings of International Conference on Computer-Aided Design (ICCAD), 2012. 81--87. Google Scholar
[34] Gao L, Wang R, Xu Y. SRAM- and STT-RAM-based hybrid, shared last-level cache for on-chip CPU-GPU heterogeneous architectures. J Supercomput, 2018, 74: 3388-3414 CrossRef Google Scholar
Figure 1
Structure of heterogeneous memory system. (a) Flat memory architecture; (b) hierarchical memory architecture
Figure 2
(Color online) Structure of two-way Hash chain list
Figure 3
Structure of HashList and entry table
Figure 4
Energy efficiency analysis model based on two-way Hash chain list page migration mechanism
Figure 5
(Color online) Normalized IPC for different size of pages (DRAM+PCM memory model). (a) THMigrator;protect łinebreak (b) MQMigrator; (c) CoinMigrator
Figure 6
(Color online) Normalized IPC for different size of pages (DRAM+STT-RAM memory model). (a) THMigrator; (b) MQMigrator; (c) CoinMigrator
Figure 7
(Color online) Normalized instruction rate for different size of pages (DRAM+PCM memory model).protect łinebreak (a) THMigrator; (b) MQMigrator; (c) CoinMigrator
Figure 8
(Color online) Normalized instruction rate for different sizes of pages (DRAM+STT-RAM memory model).protect łinebreak (a) THMigrator; (b) MQMigrator;(c) CoinMigrator
Figure 9
(Color online) Normalized IPC. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+STT-RAM heterogeneous memory system
Figure 10
(Color online) Average instruction rate. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+STT-RAM heterogeneous memory system
Figure 11
(Color online) DRAM cache utilization. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+STT-RAM heterogeneous memory system
Figure 12
(Color online) Normalized energy efficiency. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+protect łinebreak STT-RAM heterogeneous memory system
Figure 13
(Color online) Average latency of accessing memory. (a) DRAM+PCM heterogeneous memory system;protect łinebreak (b) DRAM+STT-RAM heterogeneous memory system
Figure 14
(Color online) The hit-rates of write access. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+protect łinebreak STT-RAM heterogeneous memory system
Figure 15
(Color online) The normalized execute time. (a) DRAM+PCM heterogeneous memory system; (b) DRAM+protect łinebreak STT-RAM heterogeneous memory system
Figure 16
(Color online) The normalized time of migration with THMigrator mechanism
Figure 17
(Color online) The normalized time of migration with MQMigrator mechanism
Figure 18
(Color online) The normalized time of migration with CoinMigrator mechanism
Storage/memory | Reciprocal density | Read speed | Write speed | Read power/mW | Write power/mW | Endurance |
DRAM | $4-6{F}^{2}$ | Slow | Slow | Medium | Medium | ${10}^{16}$ |
PCM | $4-12{F}^{2}$ | Slow | VerySlow | Medium | High | ${10}^{8}-{10}^{9}$ |
STT-RAM | $6-50{F}^{2}$ | Fast | Slow | Low | High | $4~\times~{10}^{12}$ |
1: /*Initialize variables*/ |
2: Initialize HashList, MigratorMap, lifeTime, Threshold, etc.; |
3:Request(${\rm page}_i$); /*Request access to the $i$-th memory page*/ |
4: |
5: $ {\rm page}_i \rightarrow {\rm value} \Leftarrow {\rm page}_i \rightarrow {\rm value}+1 $; /*the value of ${\rm page}_i$ add 1*/ |
6: moveToHead(${\rm page}_i$); /*Move ${\rm page}_i$ to the head of HashList*/ |
7: |
8: setHead(${\rm page}_i$); /*Set ${\rm page}_i$ to the head of HashList*/ |
9: |
10: |
11: MigratorMap.insert(${\rm page}_i$); /*Insert ${\rm page}_i$ into MigratorMap*/ |
12: HashList.remove(${\rm page}_i$); /*Remove ${\rm page}_i$ from HashList*/ |
13: |
14: |
15: MigratorMap.remove(${\rm page}_i$); /*Remove memory pages that exceed the life time*/ |
16: |
17: InMigratorMap(${\rm page}_i$)&&!Migratored(${\rm page}_i$) |
18: startMigrator(${\rm page}_i$); /*Migrate ${\rm page}_i$ from NVM to DRAM*/ |
19: MigratorMap.remove(${\rm page}_i$); /*Remove memory pages that exceed the life time*/ |
20: |
Memory | Read latency | Write latency | Read energy | Write energy | Read speed | Write speed |
DRAM | 3 $\mu$s (4 KB) | 3 $\mu$s (4 KB) | 0.8 J/GB | 1 J/GB | 1.09 GB/s | 1 GB/s |
PCM | 3 $\mu$s (4 KB) | 64 $\mu$s (4 KB) | 1.2 J/GB | 6 J/GB | 400 MB/s | 100 MB/s |
Configuration parameter | Value |
CPU | 2.0 GHZ + TimingSimpleCPU |
Mempry | NVMainMemory |
L1 cache | 32 KB instruction cache + 32 KB data cache |
L2 cache | 256 KB instruction cache + 256 KB data cache |
Memory structure | DRAM + PCM (1:3) |
Bus | 64 bit |
Operation mode | SE mode |
Benchmarks | The number of instruction |
bzip2 | 10000000 |
gcc | 10000000 |
leslie3d | 10000 |
mcf | 10000 |
calculix | 10000000 |
cactusADM | 10000 |
sjeng | 10000000 |
hmmer | 10000000 |
milc | 10000 |
povray | 10000000 |
soplex | 10000000 |