Blog

What is Shared Snapshot Space?

Thumbnail
In the Delphix Appliance GUI, the Capacity Management screen shows how much space is used by each dSource, VDB, and snapshot.

In the Delphix Appliance GUI, the Capacity Management screen shows how much space is used by each dSource, VDB, and snapshot.  It also shows how much space is used by "Snapshots" and "Shared Snapshot Space".  What are these quantities?  And why is the space "used" by each snapshot often so small?

Space used by all snapshots of a filesystem

The space used by "Snapshots" (and the ZFS "used by snapshots" property) is the amount of space that would be recovered if all snapshots were deleted.  This is the space that is referenced by (accessible via) any snapshot, but not by the current copy (a.k.a filesystem).  It answers the question, "How much storage is it costing me to have all these snapshots of this filesystem?".  As the filesystem remove or overwrites files, we have to keep around the old version because of the snapshots, so the amount of space used by snapshots will increase.

Space "Used" by an individual snapshot

The space "Used" by an individual snapshot is the amount of space that would be recovered if that single snapshot was deleted.  This is the amount of space unique to the snapshot -- that is, the space that is referenced by only this snapshot, and not by any other snapshot or the filesystem (ignoring clones / VDBs created from this snapshot).  As the filesystem removes or overwrites files, the amount of space "used" by the most recent snapshot will increase.

Shared Snapshot Space

The "Shared Snapshot Space" is simply the amount used by "Snapshots" minus the space "Used" by each snapshot.  This is the space that is referenced by two or more snapshots, but not by the filesystem.

Space "Written" by a snapshot

The ZFS "Written" property tells us how much data was written between the previous snapshot and this one.  It gives us an idea of the change rate of the filesystem, and we're considering exposing it in the Delphix GUI.  The space "Written" by a snapshot may be shared with many snapshots after it, and by the live ("current copy") filesystem.  So if a snapshot has a large amount of space "Written", deleting that snapshot won't necessarily recover that space; you may have to delete many snapshots after it, and even remove data from the live filesystem.

Destroying Snapshots

When we destroy a snapshot, the snapshot may have been sharing space with the adjacent (previous and next) snapshots.  If the shared space now becomes unique to the adjacent snapshots, those snapshots ("used") space will increase.  So when we delete a snapshot, we will recover that snapshot's "used" space, and some of the "Shared Snapshot Space" may become unique to the adjacent snapshots, and be transferred to their "used" space.

Examples

Let's imagine that we have a filesystem with 1 TB of data in it, but no snapshots.  Now we take some snapshots and manipulate the data in them.  What will these space accounting values look like?

Initial snapshot creation

When a snapshot is initially created, it has the same contents as the filesystem, sharing all of its space with the filesystem.  It didn't take any space to create, and no space will be reclaimed if it is destroyed.  It references the same 1 TB as the filesystem, but has no unique space, so its "Used" is zero.  The space used by all snapshots, and the shared snapshot space, are also zero.  In the Capacity Management screen of the Delphix GUI, we'll see:

Current Copy Size:           1 TB
Used By All Snapshots:       Zero
    Shared Snapshot Space:   Zero
    Snapshot A Used:         Zero

And from the command line:

$ zfs get referenced,usedbysnapshots domain0/.../datafile
NAME                          PROPERTY         VALUE   (GUI TERMINOLOGY)
domain0/.../datafile          referenced       1.0T    ("Current Copy Size")
domain0/.../datafile          usedbysnapshots  0       ("Used By All Snapshots")

$ zfs get -r used domain0/.../datafile
NAME                          PROPERTY         VALUE   
domain0/.../datafile          used             1.0T    (part of DB's space used)
domain0/.../datafile@snap-1   used             0       (snapshot "used")


If we remove all files

If we were to remove (or overwrite) all the files from the filesystem, what would happen to these quantities?  We can't actually recover the space used by the deleted/overwritten files, because they are still referenced by the snapshot.  The snapshot would continue to have the same contents, but now it wouldn't be sharing any space with the filesystem.  The 1 TB of space referenced by the snapshot would only be accessible via the snapshot, and not from any other snapshot (because there are no other snapshots) and not by the filesystem (because we've deleted or overwritten all the files).  Now all of the snapshot's space is unique, so its space "Used" will be the same as its space "Referenced":  1 TB.  The space used by all snapshots will also be 1 TB.  The shared snapshot space will still be zero, because there's only one snapshot.

Current Copy Size:           Zero
Used By All Snapshots:       1 TB
    Shared Snapshot Space:   Zero
    Snapshot A Used:         1 TB


And from the command line:

NAME                                     PROPERTY         VALUE   SOURCE
domain0/.../datafile                     referenced       0       -
domain0/.../datafile                     usedbysnapshots  1.0T    -
domain0/.../datafile                     used             1.0T    -
domain0/.../datafile@oracle_snapshot-1   used             1.0T    -


What if we took two snapshots?

What if we took two snapshots before removing (or overwriting) all the files from the filesystem?  These two snapshots have the same contents -- they reference the exact same 1 TB of data.  After deleting the files, the snapshots don't share any space with the current copy.  We're keeping their blocks around just for the snapshots, so if we deleted all the snapshots, we'd recover that space.  Therefore the space "used by all snapshots" will be 1 TB. However, if we delete either one of the snapshots, we can't reclaim any space, because the other snapshot will still reference it.  Each snapshot does not have any unique space (it's all shared with the other snapshot), so each snapshot's space "Used" is zero!  If we were to delete both snapshots, we'd recover the 1 TB, so the "Shared Snapshot Space" is 1 TB.  In this situation, we know that the snapshots are taking up space, but there is no one snapshot that is responsible for the space, so we don't know which snapshots are to blame.

 
Current Copy Size:           Zero
Used By All Snapshots:       1 TB
    Shared Snapshot Space:   1 TB
    Snapshot A Used:         Zero
    Snapshot B Used:         Zero


And from the command line:

 
NAME                                     PROPERTY         VALUE   SOURCE
domain0/.../datafile                     referenced       0       -
domain0/.../datafile                     usedbysnapshots  1.0T    -
domain0/.../datafile                     used             1.0T    -
domain0/.../datafile@oracle_snapshot-1   used             0       -
domain0/.../datafile@oracle_snapshot-2   used             0       -


What about 3 snapshots?

In the previous example, there were only 2 snapshots, so it's obvious that the shared snapshot space is shared between those two snapshots.  But if there is a third snapshot, things get more complicated.  If two or three of the snapshots are sharing space, we won't be able to tell which of them are sharing how much:

Current Copy Size:           Zero
Used By All Snapshots:       1 TB
    Shared Snapshot Space:   1 TB
    Snapshot A Used:         Zero
    Snapshot B Used:         Zero
    Snapshot C Used:         Zero


And from the command line:

NAME                                     PROPERTY         VALUE   SOURCE
domain0/.../datafile                     referenced       0       -
domain0/.../datafile                     usedbysnapshots  1.0T    -
domain0/.../datafile                     used             1.0T    -
domain0/.../datafile@oracle_snapshot-1   used             0       -
domain0/.../datafile@oracle_snapshot-2   used             0       -
domain0/.../datafile@oracle_snapshot-3   used             0       -


It could be that I created the three snapshots, then deleted the files, so I will have to destroy all three snapshot to recover the shared space.  Or it could be that I deleted the files before taking the third snapshot, so it really has no consequence and it's only snapshots A and B that are holding onto all that space.  Lastly, it could be that I took snapshot A before writing the files, so it's snapshots B and C that are holding onto the space.  The situation is increasingly complex with more snapshots, and when considering more than just one chunk of space (e.g. overwriting some parts of some files before each snapshot). To mitigate this complexity, I implemented a new feature in ZFS and the Delphix management stack that allows us to determine how much space would be reclaimed if several snapshots were destroyed, taking into account the space that is actually shared by those snapshots.  In the Delphix Capacity Management screen, you can select snapshots and see the "Total capacity of objects selected for deletion".  In our three-snapshot case, this would allow you to experimentally determine which two (or three) snapshots are actually holding onto the space. This feature is based on the new "zfs destroy -nv <list of snapshots>" feature in ZFS:

$ zfs destroy -nv domain0/.../datafile@oracle_snapshot-1%oracle_snapshot-2
would destroy domain0/.../datafile@oracle_snapshot-1
would destroy domain0/.../datafile@oracle_snapshot-2
would reclaim 0 

$ zfs destroy -nv domain0/.../datafile@oracle_snapshot-2%oracle_snapshot-3
would destroy domain0/.../datafile@oracle_snapshot-2
would destroy domain0/.../datafile@oracle_snapshot-3
would reclaim 1.0T


Aha! We need to destroy snapshots 2 and 3 to reclaim the space. We're also exploring other mechanisms for graphically displaying snapshot space using information, with the goal of letting you see at a glance which snapshots are using space.