TY - JOUR
T1 - Copy-on-abundant-write for NiMble file system clones
AU - Zhan, Yang
AU - Conway, Alex
AU - Jiao, Yizheng
AU - Mukherjee, Nirjhar
AU - Groombridge, Ian
AU - Bender, Michael A.
AU - Farach-Colton, Martin
AU - Jannen, William
AU - Johnson, Rob
AU - Porter, Donald E.
AU - Yuan, Jun
N1 - Funding Information:
This research was supported in part by NSF grants CCF-1715777, CCF-1724745, CCF-1725543, CSR-1763680, CCF-1716252, CCF-1617618, CCF-1712716, CNS-1938709, and CNS-1938180. The work was also supported by VMware, by EMC, and by NetApp Faculty Fellowships. Authors’ addresses: Y. Zhan, Y. Jiao, N. Mukherjee, and D. E. Porter, Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599; emails: {yzhan, yizheng}@cs.unc.edu, nirjhar@unc.edu, porter@cs.unc.edu; A. Conway, Department of Computer Science, VMware Research Group, Palo Alto, CA, 94304; email: conway@ajhconway.com; I. Groombridge and J. Yuan, Department of Computer Science, Pace University, New York, NY, 10038; emails: {ig73104n, jyuan2}@pace.edu; M. A. Bender, New Computer Science, Stony Brook University, Stony Brook, NY, 11794; email: bender@cs.stonybrook.edu; M. Farach-Colton, Department of Computer Science, Rutgers University, Piscataway, NJ, 08854; email: farach@cs.rutgers.edu; W. Jannen, Computer Science Department, Williams College, Williamstown, MA, 01267; email: jannen@cs.williams.edu; R. Johnson, VMware Research, 3425 Hillview Ave, Palo Alto, CA, 94304; email: robj@vmware.com. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 1553-3077/2021/01-ART5 $15.00 https://doi.org/10.1145/3423495
Publisher Copyright:
© 2021 Association for Computing Machinery. All rights reserved.
PY - 2021/1
Y1 - 2021/1
N2 - Making logical copies, or clones, of files and directories is critical to many real-world applications and workflows, including backups, virtual machines, and containers. An ideal clone implementation meets the following performance goals: (1) creating the clone has low latency; (2) reads are fast in all versions (i.e., spatial locality is always maintained, even after modifications); (3) writes are fast in all versions; (4) the overall system is space efficient. Implementing a clone operation that realizes all four properties, which we call a nimble clone, is a long-standing open problem. This article describes nimble clones in B-ϵ-tree File System (BetrFS), an open-source, full-path-indexed, and write-optimized file system. The key observation behind our work is that standard copy-on-write heuristics can be too coarse to be space efficient, or too fine-grained to preserve locality. On the other hand, a write-optimized key-value store, such as a Bε-tree or an log-structured merge-tree (LSM)-tree, can decouple the logical application of updates from the granularity at which data is physically copied. In our write-optimized clone implementation, data sharing among clones is only broken when a clone has changed enough to warrant making a copy, a policy we call copy-on-abundant-write. We demonstrate that the algorithmic work needed to batch and amortize the cost of BetrFS clone operations does not Erode the performance advantages of baseline BetrFS; BetrFS performance even improves in a few cases. BetrFS cloning is efficient; for example, when using the clone operation for container creation, BetrFS outperforms a simple recursive copy by up to two orders-of-magnitude and outperforms file systems that have specialized Linux Containers (LXC) backends by 3-4×.
AB - Making logical copies, or clones, of files and directories is critical to many real-world applications and workflows, including backups, virtual machines, and containers. An ideal clone implementation meets the following performance goals: (1) creating the clone has low latency; (2) reads are fast in all versions (i.e., spatial locality is always maintained, even after modifications); (3) writes are fast in all versions; (4) the overall system is space efficient. Implementing a clone operation that realizes all four properties, which we call a nimble clone, is a long-standing open problem. This article describes nimble clones in B-ϵ-tree File System (BetrFS), an open-source, full-path-indexed, and write-optimized file system. The key observation behind our work is that standard copy-on-write heuristics can be too coarse to be space efficient, or too fine-grained to preserve locality. On the other hand, a write-optimized key-value store, such as a Bε-tree or an log-structured merge-tree (LSM)-tree, can decouple the logical application of updates from the granularity at which data is physically copied. In our write-optimized clone implementation, data sharing among clones is only broken when a clone has changed enough to warrant making a copy, a policy we call copy-on-abundant-write. We demonstrate that the algorithmic work needed to batch and amortize the cost of BetrFS clone operations does not Erode the performance advantages of baseline BetrFS; BetrFS performance even improves in a few cases. BetrFS cloning is efficient; for example, when using the clone operation for container creation, BetrFS outperforms a simple recursive copy by up to two orders-of-magnitude and outperforms file systems that have specialized Linux Containers (LXC) backends by 3-4×.
KW - B-trees
KW - Clone
KW - File system
KW - Write optimization
UR - http://www.scopus.com/inward/record.url?scp=85100415454&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100415454&partnerID=8YFLogxK
U2 - 10.1145/3423495
DO - 10.1145/3423495
M3 - Article
AN - SCOPUS:85100415454
SN - 1553-3077
VL - 17
JO - ACM Transactions on Storage
JF - ACM Transactions on Storage
IS - 1
M1 - 5
ER -