Blog

Resumable OpenZFS send/receive

In the recently released Delphix Engine 4.2, we have enhanced our replication feature to be resistant to failures of all sorts: network outages, source machine reboots, target machine re

In the recently released Delphix Engine 4.2, we have enhanced our replication feature to be resistant to failures of all sorts: network outages, source machine reboots, target machine reboots. Now, when one of these faults occurs, replication will automatically pick up where it left off, without losing any data that was already transmitted. I worked on implementing support for this in the filesystem, in the form of resumable zfs send & receive. 

When "zfs receive" fails (e.g. due to reboot or network outage), it can now preserve the data that's already been received, along with state about what data we are still waiting for.  The sending system uses this state to generate a "resuming" send stream, which picks up where it left off. I gave a presentation at AsiaBSDcon 2015 about how OpenZFS send/receive works, including design fundamentals and new features like resumable send/receive. 

As part of Delphix's commitment to open source, we will be contributing resumable send/receive to OpenZFS (via illumos - look for a commit in the coming months).