ZFS 10 year anniversary
Halloween has always been a special holiday for ZFS. We ran our first code 10 years ago in October 2001. We integrated ZFS into OpenSolaris on October 31, 2005. It’s interesting to look back and remember what we had figured out early on, and which ideas weren’t developed until much later. Ten years ago, we had only been working on ZFS for 4 months. I had a freshly minted undergraduate degree, a new apartment on the opposite coast, and a lot of work to do. The key principles we laid out then still ring true: massive scale, easy administration, fault tolerance, snapshots, a copy-on-write always-consistent on-disk format. But the specifics were murky ten years ago.
Pooled storage was a key idea — one pool composed of many disks, and many filesystems consuming space from the pool. However, the relationship between pools on a system, and filesystems in a pool hadn’t been nailed down. The filesystem namespace was flat — no nested filesystems, no property inheritance. We weren’t sure if there would be one mountpoint for the entire pool, or one per filesystem. We hadn’t even considered clones, which are now integral to the Delphix product. We knew we wanted some sort of RAID, but had no idea it would end up looking like RAID-Z.
ZFS send and receive wasn’t considered until late in the development cycle. The idea came to me in 2005. ZFS was nearing integration, and I was spending a few months working in Sun’s new office in Beijing, China. The network link between Beijing and Menlo Park was low-bandwidth and high-latency, and our NFS-based source code manager was painful to use. I needed a way to quickly ship incremental changes to a workspace across the Pacific. A POSIX-based utility (like rsync) would at best have to traverse all the files and directories to find the few that were modified since a specific date, and at worst it would compare the files on each side, incurring many high-latency round trips. I realized that the block pointers in ZFS already have all the information we need: the birth time allows us to quickly and precisely find the blocks that are changed since a given snapshot. It was easiest to implement ZFS send at the DMU layer, just below the ZPL. This allows the semantically-important changes to be transferred exactly, without any special code to handle features like NFSv4 style ACLs, case-insensitivity, and extended attributes. Storage-specific settings, like compression and RAID type, can be different on the sending and receiving sides. What began as a workaround for a crappy network link has become one of the pillars of ZFS, and the foundation of several remote replication products, including the one at Delphix.
ZFS is an evolving product, and ZFS send/receive is a great example of that. I added “zfs send -R” on Halloween 2007, which added the ability to replicate a whole tree of filesystems, including properties and incremental rename and destroy of filesystems and snapshots. Halloween 2009, Tom Erickson implemented “received properties” — the distinction between properties set locally on the receiving system, vs properties set by “zfs receive”. For this Halloween, I’ve been working on estimating the size of the stream generated by “zfs send”, so that we can have an accurate progress bar while doing replication. My coworker Chris Siden is working on resumable ZFS send — if the send is interrupted by a network outage or one of the machines failing, we will be able to pick up where we left off, without losing any work.
The demands placed on any piece of software change over time. Software with poorly-designed internal interfaces quickly becomes a minefield of special cases, where every bug fixed introduces one more. The framework of ZFS has served us well for the past decade, but we have to be ready to re-evaluate as we go. That’s why I’ve been working on ZFS Feature Flags with Basil Crow and Chris Siden. This will allow us to evolve the ZFS on-disk format in a flexible way, with multiple independent developers contributing changes. You can read more about Feature Flags, and the other work we’ve been doing at Delphix, in the slides for a talk that George Wilson and I gave at the Open Storage Summit last week — another Halloween milestone in a decade of ZFS development.