This work was supported in part by National Key RD Program of China (Grant No. 2018YFB1004704), National Natural Science Foundation of China (Grant Nos. 61832005, 61872171), Natural Science Foundation of Jiangsu Province (Grant No. BK20190058), Key RD Program of Jiangsu Province (Grant No. BE2017152), Science and Technology Program of State Grid Corporation of China (Grant No. 52110418001M), and Collaborative Innovation Center of Novel Software Technology and Industrialization.
[1] Tweedie S. Ext3, journaling filesystem. In: Proceedings of Ottawa Linux Symposium, Ottawa, 2000. 24--29. Google Scholar
[2] Mathur A, Cao M, Bhattacharya S, et al. The new ext4 filesystem: current status and future plans. In: Proceedings of the Linux symposium Ottawa, 2007. 21--33. Google Scholar
[3] Weil S A, Brandt S A, Miller E L, et al. Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI) Seattle, 2006. 307--320. Google Scholar
[4] Wei Q, Chen J, Chen C. Accelerating file system metadata access with byte-addressable nonvolatile memory. ACM Transactions on Storage 2015, 11:1--28. Google Scholar
[5] Sehgal P, Basu S, Srinivasan K, et al. An empirical study of file systems on NVM In: Proceedings of the 31st International Conference on Mass Storage Systems and Technologies (MSST) Santa Clara, 2015. 1--14. Google Scholar
[6] Chen C, Yang J, Wei Q, et al. Optimizing file systems with fine-grained metadata journaling on byte-addressable NVM ACM Transactions on Storage 2017, 13:1--25. Google Scholar
[7] Lee D-Y, Jeong K, Han S-H, et al. Understanding write behaviors of storage backends in ceph object store. In: Proceedings of the 33rd IEEE International Conference on Massive Storage Systems and Technology (MSST) Santa Clara, 2017. Google Scholar
[8] Roselli D S, Lorch J R, Anderson T E, et al. A comparison of file system workloads. In: Proceedings of 2000 USENIX Annual Technical Conference (ATC) San Diego, 2000. 41--54. Google Scholar
[9] Leung A W, Pasupathy S, Goodson G R, et al. Measurement and analysis of large-scale network file system workloads. In: Proceedings of 2008 USENIX Annual Technical Conference (ATC) Boston, 2008. 2--5. Google Scholar
[10] Dong M, Ota K, Yang L T. LSCD: A Low-Storage Clone Detection Protocol for Cyber-Physical Systems. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2016, 35: 712-723 CrossRef Google Scholar
[11] Li D, Dong M, Tang Y. A novel disk I/O scheduling framework of virtualized storage system. Cluster Comput, 2019, 22: 2395-2405 CrossRef Google Scholar
[12] Joo Y, Park S, Bahn H. Exploiting I/O reordering and I/O interleaving to improve application launch performance. ACM Transactions on Storage 2017, 13:1--17. Google Scholar
[13] Chahal D, Nambiar M. Cloning io intensive workloads using synthetic benchmark. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering L'AQUILA, 2017. 317--320. Google Scholar
[14] Madireddy S, Balaprakash P, Carns P, et al. Analysis and correlation of application i/o performance and system-wide i/o activity. In: Proceedings of the 12th IEEE International Conference on Networking, Architecture, and Storage (NAS) Shenzhen, 2017. 1--10. Google Scholar
[15] Li D, Dong M, Tang Y, et al. Triple-L: improving CPS disk I/O performance in a virtualized NAS environment. IEEE Systems Journal 2015, 11(1): 152--162. Google Scholar
[16] Jannen W, Yuan J, Zhan Y, et al. BetrFS: a right-optimized write-optimized file system. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST) Santa Clara, 2015. 301--315. Google Scholar
[17] Best S. Journaling file systems. Linux Magazine 2002, 4:24--31. Google Scholar
[18] Chen J, Tan Z, Wu F, et al. sjournal: A new design of journaling for file systems to provide crash consistency. In: Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage (NAS) Tianjin, 2014. 53--62. Google Scholar
[19] Lee W, Lee K, Son H, et al. WALDIO: eliminating the filesystem journaling in resolving the journaling of journal anomaly. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC) Santa Clara, 2015. 235--247. Google Scholar
[20] Dua R, Kohli V, Patil S, et al. Performance analysis of union and cow file systems with docker. In: Proceedings of 2016 International Conference on Computing, Analytics and Security Trends (CAST) India, 2016. 550--555. Google Scholar
[21] Son M, Ahn J, Yoo S. Nonvolatile write buffer-based journaling bypass for storage write reduction in mobile devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2017, 37:1747--1759. Google Scholar
[22] Huang K, Zhou J, Huang L, et al. NVHT: An efficient key--value storage library for non-volatile memory. Journal of Parallel and Distributed Computing 2018, 12:339--354. Google Scholar
[23] Nightingale E B, Veeraraghavan K, Chen P M, et al. Rethink the sync. ACM Transactions on Computer Systems 2018, 26:6. Google Scholar
[24] Aghayev A, Ts'o T, Gibson G, et al. Evolving ext4 for shingled disks. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST) Santa Clara, 2017. 105--120. Google Scholar
[25] Rodeh O, Teperman A. ZFS-a scalable distributed file system using object disks. In: Proceedings of 2003 International Conference on Mass Storage Systems and Technologies (MSST) San Diego, 2003. 207--218. Google Scholar
[26] Rodeh O, Bacik J, Mason C. Btrfs: The linux B-tree filesystem. ACM Transactions on Storage 2013, 9:1--32. Google Scholar
[27] Jie Chen , Jun Wang , Zhihu Tan . Effects of Recursive Update in Copy-on-Write File Systems: A BTRFS Case Study. Can J Electr Comput Eng, 2014, 37: 113-122 CrossRef Google Scholar
[28] Choi H J, Lim S-H, Park K H. JFTL: A flash translation layer based on a journal remapping for flash memory. ACM Transactions on Storage 2009, 4:1--22. Google Scholar
[29] Lee E, Yoo S, Jang J-E, et al. Shortcut-JFS: A write efficient journaling file system for phase change memory. In: Proceedings of 2012 IEEE Conference on Mass Storage Systems and Technologies (MSST) Pacific Grove, 2012. 1--6. Google Scholar
[30] Chen T-Y, Chang Y-H, Chen S-H. Enabling write-reduction strategy for journaling file systems over byte-addressable NVRAM In: Proceedings of the 54th International Conference on Design Automation Conference (DAC) Austin, 2017. 1--6. Google Scholar
[31] O'Neil P, Cheng E, Gawlick D. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996, 33: 351-385 CrossRef Google Scholar
[32] Shetty P J, Spillane R P, Malpani R R, et al. Building workload-independent storage with VT-trees. In: Proceedings of 11th USENIX Conference on File and Storage Technologies (FAST) San Jose, 2013. 17--30. Google Scholar
[33] Wu X, Xu Y, Shao Z, et al. Lsm-trie: An lsm-tree-based ultra-large key-value store for small data items. In: Proceedings of 2015 USENIX Annual Technical Conference (ATC) Santa Clara, 2015. 71--82. Google Scholar
[34] Lu L, Pillai T S, Gopalakrishnan H, et al. Wisckey: separating keys from values in SSD-conscious storage. ACM Transactions on Storage 2017, 13:5. Google Scholar
[35] Griffiths N. nmon performance: A free tool to analyze aix and linux performance. 2003. Google Scholar
[36] Son Y, Kim S, Yeom H Y, et al. High-performance transaction processing in journaling file systems. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST) Oakland, 2018. 227--240. Google Scholar
[37] Rajimwale A, Prabhakaran V, Davis J D, “Block management in solid-state devices. In: Proceedings of 2009 USENIX Annual Technical Conference (ATC) San Diego, 2009. Google Scholar
[38] Tarasov V, Zadok E, Shepler S. Filebench: A flexible framework for file system benchmarking. login: The USENIX Magazine 2016, 41:6--12. Google Scholar
Figure 1
(Color online) Journaling on NVMe SSD causes severe I/O performance fluctuations, whereas NVMe SSD is busy only for very short periods. (a) IOPS performance; (b) utilization of disks.
Figure 2
Conventional file system with journaling on NVMe SSD.
Figure 3
(Color online) Testbed for experiments.
Figure 4
(Color online) IOPS performance of Ceph FileStore with journaling on NVMe SSD fluctuates, and the journaling on NVMe SSD is frozen frequently. Here, the number of threads is set to eight.
Figure 5
(Color online) IOPS performance of Ceph FileStore with journaling on NVMe SSD fluctuates, and the journaling on NVMe SSD is frozen frequently. Here, the microwrite size is set to 8 kB. (a) IOPS performance; (b) utilization of NVMe SSD.
Figure 6
(Color online) When replacing the backend HDD with NVMe SSD, the IOPS performance becomes stable.
Figure 7
Illustration of the MIM architecture.
Figure 8
Illustration of HTMLL.
Figure 9
(Color online) Performance comparison of MIM and the original Ceph FileStore under the FIO workload. (a) IOPS; (b) latency.
Figure 10
(Color online) Performance comparison of MIM and the original Ceph FileStore under the Varmail workload. protectłinebreak (a) IOPS; (b) latency.
Figure 11
(Color online) Performance comparison of MIM and the original Ceph FileStore under the FileServer workload. protectłinebreak (a) IOPS; (b) latency.
Figure 12
(Color online) Instantaneous performance of IOPS and write latency of the systems under the FIO workload. protectłinebreak (a) IOPS performance; (b) write latency.
Figure 14
(Color online) Instantaneous performance of IOPS and write latency of the systems under the FileServer workload. (a) IOPS performance; (b) write latency.
Figure 15
(Color online) Throughput of MIM and FileStore for large writes.
Figure 16
(Color online) (a) Average busy rate of NVMe SSD in MIM and Ceph FileStore; (b) The memory consumption of MIM and Ceph FileStore.