Understanding SnapSync and LogSync for Oracle

Several customers have asked me recently to explain the details of the SnapSync and LogSync operations for Oracle, how those affect TimeFlow and VDB Provision operations, and to provide some insight on the impact of things like Physical Standby, Block Change Tracking and NOLOGGING operations.

Several customers have asked me recently to explain the details of the SnapSync and LogSync operations for Oracle, how those affect TimeFlow and VDB Provision operations, and to provide some insight on the impact of things like Physical Standby, Block Change Tracking and NOLOGGING operations.
Since Delphix uses standard Oracle APIs, the semantics about how Delphix captures the initial database backup and subsequent database changes is similar (but not identical) to how it might work in a traditional RMAN backup methodology.  The Delphix Software Virtual Appliance appears to RMAN as a tape device by using the RMAP API.  Since Delphix offers continuous data protection on databases, the way that Delphix synthesizes this data to present TimeFlow is very different.

The connection between the source database and Delphix

Delphix manages the transmission of data and communications between the source database’s host and the Delphix Software Virtual Appliance (SVA) using a set of shell scripts known as the Delphix toolkit, which in turn uses RMAN and the RMAN API.  This toolkit is pushed to the source database’s host when you add an environment or when you push the environment refresh button. RMAN is the communication mechanism between the source database and Delphix.  However, the Delphix SVA itself does not have any piece of the Oracle stack.  Delphix relies on the SnapSync and LogSync operations to gather data from databases, and the toolkit encompasses the work done for these operations at the source database’s host. For both data collection and data transmission, SnapSync and LogSync use RMAN and OJDBC standard Oracle APIs.

Further, for any of these RMAN operations, the key parameter and/or metadata variable is the pointer that Oracle uses to define its order of transactions at the database level, which is called the Oracle System Change Number (SCN).  All SnapSync and LogSync operations attach SCN metadata to the database or log files shipped to the Delphix SVA. We’ll see why this matters later on.

Data Collection when you link to a source database

When you first link to a source database, or add a dSource, the Delphix toolkit invokes the RMAN API for a full backup operation, and through this API the blocks are collected and transmitted to the Delphix SVA.  Since this operation takes place while your source database is live and active, you are actually performing what Oracle calls an “inconsistent” backup.  Essentially, this means that the database data is changing during the time that the backup progresses.  To make it “consistent”, you must also collect a record of all of the change that occurred during the backup.  For Oracle, these are the log files (archive or online redo logs).  Therefore, at the end of that full backup, the toolkit will also locate and ship all of the logs that recorded change against that source database between the time you started and the time you finished the inconsistent backup.  One important consequence of this is that, although you started the backup at SCN #X, it only became consistent at SCN#(X+N), where N represents the SCNs that committed during the time it took for the backup operation to complete.  So, the Snapshot SCN, which will be the first SCN that will appear on the first card in your Delphix TimeFlow will be SCN#(X+N).

Data Collection for subsequent SnapSyncs against that source database

Most Oracle shops are familiar with the concept of an Incremental Backup.  Essentially, that’s a partial backup that contains all of the changes between some full backup baseline and the present.  Unlike a full backup, it cannot be used alone; it must always be paired with a full backup to achieve a point in time restore.  Although an incremental contains fewer blocks than a full backup, it does not mean that fewer blocks had to be examined to create it.  (We’ll examine that when we get to BCT).  When you execute a SnapSync, the Delphix toolkit invokes the RMAN API for an incremental backup operation, and through this API the blocks are collected and transmitted to the Delphix SVA.  Note that it is always an incremental backup operation.  With Delphix Continuous Data Protection, and with the synthesis that Delphix performs, you get the benefit of a full backup at the price of an incremental.  So, there’s never a need to go back for a full backup ever again.

For database consistency, some or all of the logs collected during an incremental backup will be identified or collected by the toolkit and transmitted to Delphix.  If LogSync is on, Delphix is already collecting the logs, and so the logs necessary to maintain consistency are already being shipped and the toolkit can merely identify them.  If LogSync is off, then as part of the operation to conduct the SnapSync, several logs will be transmitted at the end to make sure that the incremental backup will be consistent.

Data Collection using LogSync against a source database

Delphix utilizes Log shipment for LogSync on Oracle databases.  Oracle Redo lives either in the online redo log or the archive logs.  Essentially, these logs collect replay-able changes to the database.  Delphix LogSync performs near real time collection of change data being applied to the source and updates the Delphix TimeFlow as a result.  This operation generally operates a few seconds to a few minutes behind the source database.

Delphix TimeFlow for Oracle databases pushes out ahead of the SnapSync in that once a SnapSync operation completes, Delphix continues to collect Logs and moves the end point marker on the SCN Range to be equal to the last SCN it collected every time a log is received and processed.  The end point marker on the SCN Range is the most recent point at which a consistent database copy can be provisioned.  This update is the visible sign of the continuous data protection that Delphix offers.

Operating SnapSync and LogSync together for Source Databases

Once LogSync is operating, it continues to operate even if a SnapSync operation is requested.  For example, suppose that I did an initial link to a source database that completed at 3:45 am and that Log Sync is enabled.

Delphix TimeFlow will begin at 3:45 am and will be continuous until the next SnapSync operation completes.  If a second SnapSync were requested at 7:00am and completed at 7:30 am, then the TimeFlow associated to the 3:45am card will continue to the SCN just before the Snapshot SCN on the 7:30 am card, and then the following SCN will appear on the 7:30 am card as the Snapshot SCN.

Impact of Provisioning a Virtual Database on TimeFlow

Since Delphix is transparent to the end user, a Virtual Database follows the same semantics as any other database.  So, when you provision a database with Delphix, you can expect the same results as if you were doing it with Oracle.  A Virtual Database Provision typically includes a RESET LOGS operation.  Once a RESET LOGS operation takes place, even though the content of the VDB is identical to some point in the source database, it is in a different database incarnation.

BCT and Physical Standby

The great advantage of performing incremental backups is that less data needs to be stored and moved over the wire.  For Oracle, the question of how many blocks need to be examined to produce that backup is a function of the use of Oracle’s Block Change Tracking (BCT) feature.  Simply, BCT keeps a very tiny log of all the blocks that experienced change so that when RMAN is invoked, only those changed blocks are fetched.  Thus, the cost of examining the database for incremental change becomes a function of database change instead of database size, with obvious benefits.  With BCT off, however, even a standard Oracle RMAN incremental backup will force a read of 100% of the database.  Another slight twist has to do with Physical Standby databases in Oracle 10g.  Unfortunately, even in situations when the primary 10g database is using BCT, an Oracle 10g physical standby database does not make effectual use of BCT, and thus both full and incremental backups will force a read of 100% of the database.

NOLOGGING Operations

It’s very common for customers to want to do large load operations but to do so in a NOLOGGING operation so as not to generate a large amount of redo that they don’t need.  Any NOLOGGING operation purposefully avoids the creation of redo that creates a potential inconsistency in any RMAN backup.  Since Delphix follows the same semantics, this potential exists within Delphix as well.  Customers that have a strict requirement to do large NOLOGGING operations will typically (1) initiate a SnapSync, (2) turn off archive log mode, (3) perform their NOLOGGING operation, (4) turn archive log mode back on, and (5) initiate another SnapSync.  This approach minimizes the discontinuity in TimeFlow, just as doing the same with 2 full backups – one before and one after the operation, would minimize any recoverability issues.

Synthesis of Fulls from Incrementals on the Delphix Server

Finally, we turn to the concept of synthesizing Full backups from Incremental backups.  When RMAN transmits to the Delphix SVA a shipment of blocks and logs related to a SnapSync, Delphix synthesizes that SnapSync into the equivalent of a full backup.  Thus, even though we pay the low cost of taking and shipping the incremental backup, we get the recoverability benefit of having a full backup ready for provision.  And, since Delphix provisions in seconds, that benefit can be extreme.