Delphix 4.0 -Faster, Better

As our customer base grows, Delphix is deployed into a variety of environments.

Uday Vallamsetty

Mar 11, 2014

As our customer base grows, Delphix is deployed into a variety of environments. In the last few months, I worked with customers who wanted to see disparate behaviour from Delphix.

  • One customer has a set of application environments that generate an aggregate load over 10GB/sec. The customer virtualizes all of their app/dev/test environments using Delphix. We demonstrated that Delphix can exceed the SLAs for each environment.

  • Another customer used Delphix to migrate their application environments to a remote data center. The two data centers are linked by a 1GigE pipe, of which only 50MB/sec is provisioned for the migration project. Using Delphix replication and in-built compression, their migration project completed with a higher end-to-end throughput than what was provisioned, with minimal load on their existing infrastructure.

cars

These two customers made completely different demands from Delphix in terms of throughput and performance. One of them had a race track and wanted to see how fast they could drive Delphix. The other customer was more interested in the amount of gas Delphix can save on their daily commute, imagine having a vehicle that can deliver both!

This is what our release of Delphix 4.0 addresses. We want to enable our customers to configure Delphix to go as fast or save as much load on infrastructure as needed. This is the first in a series of posts I will do to share the key features in Delphix 4.0 that helped us reach this goal.

SnapSync

SnapSync is the process we use to non-intrusively pull the dataset being loaded into Delphix. We create a snapshot of the data within Delphix. In 4.0, we changed the underlying transport used for SnapSync to be based off Delphix Session Protocol (DSP).

DSP adds a rich set of features including improved throughput, resiliency to short link failures and added configurability. DSP improves the efficiency with which data is transported into Delphix.

A majority of our customers have their production and non-production environments linked over a 1 GigE WAN. Using built-in compression in DSP, SnapSync can now run at speeds of 170MB/sec over a 1 GigE link. Under similar conditions, SnapSync in 3.2 peaks at ~80MB/sec. In addition to this, customers can also choose to limit the load exerted by SnapSync on the infrastructure.

Using compression, SnapSync still gets higher end-to-end throughput than what can be transferred over the wire. The following graphic demonstrates how SnapSync running in 4.0 compares to 3.2. The red line is the maximum throughput at wire speed. Snapsync throughput is higher than line rate for 3.2 since the end-to-end throughput takes zero blocks into account, but they are not actually transferred by SnapSync.

snapsync graph

In addition to efficiently using a slow link, SnapSync in 4.0 also allows customers with high bandwidth links to fully utilize the underlying connection and save valuable time. Using DSP as the transport, SnapSync can now push much higher throughput compared to 3.2, given IO bandwidth to read and write the data.

One of our largest retail customers runs their end-of-day and end-of-month financial close reporting jobs on VDBs. The batch window starts at 9.00pm when all the stores close. The reporting teams need fresh data in ODS within 90mins in order to finish the batch jobs.

Before Delphix, they were doing a storage level "full-copy" of the production database every night, followed by DBA activity to bring up and validate the database. This process was complicated and involved several handoffs between Storage, DBA and Virtualization teams. Despite careful orchestration, the process was error prone and would not leave any room for batch failures.

Once they moved to Delphix, getting uptodate data to their ODS involved two clicks by the delphix administrator! They compressed their batch window by 15 mins, as refreshes only require incremental copy. The entire process was simplified reducing errors in handoffs. With 4.0 and faster Snapsync, they can further reduce the refresh cycle.

The batch window now becomes resilient to any unforeseen batch errors. I talk about the improvements we made to LogSync in part 2 of this post.