Oracle VDB Snapshot Improvements in Delphix 3.2

Prior to Delphix version 3.2, taking a snapshot of an Oracle virtual database was a distributed operation which required coordination between the VDB and the Delphix Engine.

Prior to Delphix version 3.2, taking a snapshot of an Oracle virtual database was a distributed operation which required coordination between the VDB and the Delphix Engine. Taking a virtual database snapshot demanded additional cycles on the machine hosting the VDB, increased network traffic between the Delphix Engine and the host, and placed some restrictions on the virtual database itself while the snapshot was in progress. All of these drawbacks are attributed to the use of Oracle's hot backup API to take VDB snapshots.

In 3.2, we've altered how we take VDB snapshots to eliminate performance impact to the host during snapshot operations, and provide more flexibility in how Oracle VDBs can be configured and used. To take a snapshot of a VDB the Delphix Engine would switch the VDB into hot backup mode, take a snapshot of the filesystems containing the datafiles, switch hot backup mode off, and force a log switch to obtain all the archive logs associated with the backup interval.

When hot backup mode is turned on Oracle writes complete before images of changed blocks to the redo files to ensure that any fractured blocks in the backup can be resolved (a fractured block is created when the operating system utility used to take the backup copies a block while it is being written).

This impacts the performance of the VDB. Also, when hot backup mode is turned on Oracle checkpoints the database. Although not as costly as the additional redo, Oracle must touch the headers of all the datafiles to perform a checkpoint which costs cycles and bandwidth. The biggest drawback to hot backup mode is that VDBs must run in archivelog mode to ensure that the redo generated while the hot backup is taken is captured so that it can later be used to provision the snapshot.

In a virtual database the online logs and archive log destination are mounted over nfs so archiving these logs amounts to reading the data in the online log over the network and writing the same data back over the wire into a new file. Thus, in archivelog mode the data we send over the network pertaining to redo is triple the amount we would send in noarchivelog mode. A non-performance related drawback of this approach is that Oracle cannot handle multiple requests to switch into backup mode.

So if a user shifted a tablespace into hot backup mode to perform their own backup the Delphix Engine could not take a snapshot of this virtual database until the user completed their backup and exited backup mode. In 3.2 all of these drawbacks and restrictions have been lifted by altering the procedure by which the Delphix engine takes snapshots of virtual databases.

Using an approach adapted from this NetApp/Oracle whitepaper every VDB snapshot is now backed by a crash consistent backup of the database. A crash consistent snapshot of an Oracle database is a complete image of the database that looks as though the plug was pulled on the machine hosting the VDB. To be slightly more technical, a crash consistent snapshot is a complete image of the database at some point in time that preserves the write ordering of each file within the snapshot. DxFS snapshots satisfy this criteria.

Taking crash consistent snapshots is simple and fast. There is no need to issue any commands to the VDB at all. We simply initiate a snapshot on the filesystem that contains the relevant datafiles. Since we no longer need to interact with Oracle, this approach has no impact to the running VDB whatsoever.

To be more explicit, the user can run VDBs in noarchivelog mode, take backups using the hot backup API, and take Delphix snapshots without impacting the performance of the VDB. The snapshot operation is now entirely local to the Delphix Engine so there is no load placed on the network either.

In 3.2, you no longer need to worry about how VDB snapshots will impact users or whether they have configured their VDBs correctly. Fast, low-impact snapshots provide the agility to take snapshots when you need them, not when it's feasible to do so - letting developers focus on their project needs without worrying about the implications of their actions.