Design and application of new storage systems

2023-07-30 03:05GuangyanZHANGDanFENGKeqinLIZiliSHAONongXIAOJinXIONGWeiminZHENG

Guangyan ZHANG ,Dan FENG ,Keqin LI ,Zili SHAO ,Nong XIAO ,Jin XIONG ,Weimin ZHENG

1Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

2School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

3Department of Computer Science, State University of New York, NY 12561, USA

4Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China

5School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China

6Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

A storage system is the core of a computer,and plays an important role in the sustainable develop‐ment of emerging strategic industries,such as artifi‐cial intelligence,big data,cloud computing,and the Internet of Things.Storage stack access is a major factor restricting the performance of data-intensive systems because of the increasing performance of processors and network devices.Recently,new storage devices have attracted wide attention due to their ability to break the “memory wall.” These devices include block-addressable flash memory,byte-addressable nonvolatile random access memory (NVRAM),in-memory computing devices,and large-capacity optical stor‐age.Continuous innovation in algorithms,software designs,and hardware is necessary to build largescale storage systems with high throughput,low ac‐cess latency,and high data reliability.It can address challenges in building larger-scale and higher-perfor‐mance systems with more complex structures.Fur‐ther,it can boost the experience in building and ap‐plying relevant systems and accelerate developing the big data processing.

Researchers have been trying to solve the “me‑mory wall” problem and enhance the corresponding software and hardware ecosystem;consequently,much progress has been made in the design and application of new storage systems,including but not limited to the following:

1.New storage devices continue to be designed and optimized,e.g.,open-channel solid-state drives(SSDs),byte-addressable NVRAM,and in-memory computing devices.In addition,some simulators,emu‐lators,and software-defined device development plat‐forms are presented to promote the design and opti‐mization of new storage devices.

2.Existing storage software systems,initially de‐signed for hard drive disks or traditional SSDs,cannot use the performance potential of emerging storage devices completely.New file systems,storage man‐agement software,not only structured query language(no-SQL) databases,and key components are designed for emerging storage devices.

3.Studies have employed new storage devices to accelerate application solving (e.g.,combinatorial optimization problem) and boost the performance of traditional storage systems (e.g.,hard disk drive (HDD)based erasure-coded storage).

In this context,the journalFrontiers of Informa‐tion Technology &Electronic Engineeringhas orga‐nized a special feature on the design and application of new storage systems.This special feature covers assistant design tools,various storage software,ap‐proaches,and related applications for new storage de‐vices.In addition,this feature intends to provide a re‐view of advances and future research directions in new storage systems.Seven papers have been selected for this feature after a rigorous review process,including one review article and six research articles.

Guangyan ZHANG et al.presented a compre‐hensive literature review of the design and applica‐tion of open-channel SSDs from five key metrics:throughput,latency,lifetime,performance isolation,and resource use.The study first introduced the openchannel SSDs from the aspects of physical layout,properties of the flash translation layer,and design of interfaces.It explained the performance advan‐tages and further optimization opportunities in de‐signing and applying open-channel SSDs.Then,the methodologies were discussed in detail to leverage the performance benefits of open-channel SSDs in designing interfaces,co-designing flash translation layers,exploiting internal parallelism,and optimiz‐ing input/output (I/O) scheduling and garbage col‐lection.The paper discussed the challenges in this area to bridge the gap between theoretical study and practical implementation.Further,the study explored potential future development to demonstrate the ben‐efits of open-channel SSDs.

Despite the rapidly developing SSD features in the market,research on flash firmware remained mostly simulation-based due to the lack of a realistic and ex‐tensible SSD development platform.Zili SHAO et al.proposed SoftSSD,a software-defined SSD develop‐ment platform for rapid flash firmware prototyping.The core of SoftSSD is a novel framework with an event-driven programming model.New flash trans‐lation layer (FTL) algorithms can be applied using the programming model and integrated into full-featured flash firmware directly.SoftSSD has been implemented with real hardware and evaluated with real application workloads.Experiments revealed that SoftSSD can achieve good performance,observability,and exten‐sibility.SoftSSD has been open-sourced for public access.

The emergence of new hardware,e.g.,persistent memory (PM) and smart network interface (SmartNIC),has brought new opportunities to file system design.However,using the features of PM and SmartNIC is challenging.Yitian YANG and Youyou LU designed and implemented a local file system called NICFS that applied the high throughput and byte addressability of PM and the processing power of SmartNIC to im‐prove file system performance and reduce host CPU use.A series of experiments verified the system per‐formance,scalability,and effectiveness of each part of the design.

PM file systems achieve high performance by exploiting the advanced features of PMs,including non-volatility,byte addressability,and dynamic ran‐dom access memory (DRAM) like performance.How‐ever,these PMs suffer from limited write endur‐ance.Existing space management strategies in PM file systems induce a severely unbalanced wear prob‐lem,quickly damaging the underlying PMs.Duo LIU et al.proposed an efficient wear-leveling-aware multigrained allocator called WMAlloc.Moreover,a bitmapbased multi-heap tree (BMT) was proposed to en‐hance WMAlloc,by avoiding the recursive split and inefficient heap searches.This significantly reduced the overhead of space management while achieving better wear-leveling of underlying PMs.The results from extensive experiments further validated the ef‐fectiveness of WMAlloc.

Extendible hashing is an effective way to man‐age large-scale data and improve the efficiency of storage systems.Tao CAI et al.designed NEHASH,a high-concurrency extendible hashing for non-volatile memory (NVM),which uses a multilevel hash direc‐tory with lazy expansion to improve the concurrency and efficiency of extendible hashing.The study opti‐mized the management strategy of hash directories and buckets and distributed them between DRAM and NVM.NEHASH achieved higher read and writethroughput in a multithreaded environment than the existing extendible hashing schemes.

Erasure coding (EC) has better storage efficiency but higher update overhead and repair costs than rep‐lication.In addition,concurrent updates produce con‐sistency and reliability challenges in EC applications.Yaofeng TU et al.introduced an erasure-coded stor‐age system called decoupled data updating and cod‐ing (DDUC),which uses PM to implement a light‐weight logging mechanism and decouples data up‐dating and EC encoding.Further,a data placement policy was proposed that combines replication and parity blocks.This addressed the data reliability re‐duction caused by concurrent updates while ensuring high concurrency by saving temporary redundant blocks of data at the checksum node.

Combinatorial optimization problems are critical and common but are NP-hard and difficult to solve.The chaotic simulated annealing algorithm effectively solves the combinatorial optimization problems.How‐ever,general computing platforms cannot execute it efficiently.Guangyu SUN et al.proposed a software–hardware co-optimization scheme.First,the algorithm implementation was modified to be more hardwarefriendly while maintaining effectiveness.Then,a hard‐ware architecture called COPPER was designed for in-memory computing using the memristor.COPPER can run the chaotic simulated annealing algorithm efficiently and significantly improve the speed and energy efficiency.

Overall,a broad spectrum of current research topics relevant to the design and application of new storage systems is covered in this special feature.These included new types of storage devices and softwaredefined device development platforms,file systems designed for new storage devices,storage allocators in file systems,extendible hashing for NVM,and ap‐plications for new devices.This collection of diverse but interconnected topics may benefit those interested in new storage systems or related areas.

Finally,we thank all the authors for their sup‐port and valuable contributions to this special fea‐ture.We are especially grateful to all the reviewers for their insightful comments and helpful suggestions to all the submissions.