Hybrid Cloud and Migration Architectures

The difficulty of moving data is a classic barrier for migrating workloads into the cloud, and is the primary reason why most organizations base their new applications around architectures in public clouds but leave a huge proportion of their old applications on-premise.

Dan Kimmel

Apr 23, 2015

One of the interesting things that happens when you're on the engineering team of an enterprise software product is that many people you speak to (potential customers, candidates you're interviewing, your mom) have read the company's marketing material but don't understand what the product actually does. They get lost in the messaging: "Okay, you say it helps me cut application development time by 10% to 50%... but how does it work?"

As a member of our cloud engineering team, I'd like to examine a few concrete use cases where Delphix can uniquely help you migrate workloads into public clouds, avoid cloud lock in, and keep sensitive data secure in the process. Along with each use case, I'll include architecture diagrams that summarize how such a solution would work.

Overcoming data inertia

The difficulty of moving data is a classic barrier for migrating workloads into the cloud, and is the primary reason why most organizations base their new applications around architectures in public clouds but leave a huge proportion of their old applications on-premise.

The problem is usually not that there are hardware- or software-level incompatibilities with the infrastructure in the cloud -- it's simply that data movement is challenging for organizations with hundreds of applications. You can't shut down your datacenter to move it, so you have to upload backups and then send incremental updates until the copy in the cloud is a close match to the original.

Then you have to keep applying incremental updates until you've run a plethora of tests, many of which might require their own copies of the data to avoid modifying the one that you'll eventually use to failover into the public cloud. Going through this exercise manually is enormously labor-intensive (read: expensive) and error-prone.

If your data is stored in relational databases, then you'll need a DBA (or a team of them) to run these tasks -- their expertise is necessary to ensure that the transfer is performed without data loss or corruption. Even if the data is just flat files from an enterprise application framework like Oracle E-Business Suite, the playbooks we've seen for simply making a usable copy are over 50 (very dense) pages long.

As data sources have become more complex, automating the processes that surround them has become more important than ever, but the market for these products is surprisingly limited, even in today's DevOps-focused world. To create the Delphix product, our engineering team has developed exhaustive integrations with every major RDBMS platform (Oracle, Microsoft SQL Server, PostgreSQL, SAP/Sybase ASE, and soon MySQL and IBM DB2) and one of the most popular enterprise application frameworks (Oracle EBS) on top of our filesystem snapshotting technology.

Snapshotting is a nice technology on its own because creating data copies becomes a cheap / fast operation, but our detail-oriented data automation is what takes snapshots from "a nice thing that anybody can get for free in Linux" to "a product that cuts application delivery times in half."

We've literally spent years learning about various storage and logging mechanisms, how to create consistent snapshots of each one, and the differences between myriad combinations of versions, features, and patches of the platforms we support so that doing full or incremental backups of your data using Delphix will work without manual intervention.

With our in-depth knowledge of database behavior, the simplest data movement use cases are already taken care of -- put a Delphix VM in the public cloud and simply use it to back up your data over a VPN connection to your on-premise infrastructure.

hybrid cloud 1

Once you have a golden backup of your data with streaming updates, you can easily create virtual databases to run test workloads, allowing you to ensure that the cloud migration was successful and quickly iterate on issues if it wasn't:

hybrid cloud 2

It's important to note that once you've made one golden backup of production to your cloud, it's much better (easier, but also more space- and time-efficient) to make virtual databases for non-production copies through Delphix than it is to copy existing test/dev instances into Delphix to migrate them.

Using Delphix to create the non-production copies allows you to refresh the virtual data from production at will. This also means that for every production environment you migrate, there's 3-10 copies of it in non-production environments that you won't have to migrate at all! Y

ou could stop here and just run test/dev on virtual databases in the cloud, perpetually copying production data into the cloud using Delphix to allow you to refresh the test/dev instances. But let's say you want to eventually migrate production into the cloud as well. Once you've run a few rehearsal migrations on throwaway virtual copies, you can use our virtual-to-physical (V2P) function to export a full physical copy of the database to a VM in the cloud, removing Delphix from the equation and creating a new "production-ready" database.

hybrid cloud 3

At this point you're on the brink of being able to failover from your on-site production database to your new cloud, but there is one final step -- you have to manually sync the cloud database to the live on-premise version.

hybrid cloud 4

(Making this step automatic is on my wishlist of future Delphix features, specifically for this use case.) Once that's complete, point your application tier at the new cloud instance and shut down the on-premise copy. You've just migrated the database (and all of its copies) into the cloud with only one manual data step!

Sidestepping cloud lock-in

Another major issue facing enterprises moving to a public cloud is choosing which cloud will be most cost-effective for them. Amazon Web Services is by far the most entrenched vendor, but many of its competitors offer lower rates for particular use cases, and everyone charges for their services slightly differently.

Furthermore, companies like Microsoft and Oracle offer more favorable licensing if you run their database or application software in their clouds. To realize the savings cloud vendors offer, it's critical to choose the right cloud the first time -- if you don't, you'll have to migrate your data all over again, incurring another huge migration expense.

Because cloud vendors don't have a vested interest in helping you migrate away, the tools they provide for these cloud-to-cloud workflows are nonexistent. The third-party market for these tools centers around the ability to either rebuild VMs in the new cloud based on descriptions of what's in them (products like Chef and Puppet) or to copy VMs to the new cloud, if your source and destination are supported (products like VMware vRealize Automation and Dell Cloud Manager, or cloud-specific migration tools).

In either case, if you're not already using these tools, the ramp up time will be considerable to adopt them, and at the end of the day the data backing your applications will still be left sitting in the old cloud, so there's no path forward after the easy stuff has been migrated. Delphix currently supports the most popular enterprise clouds (AWS, VMware on-premise, on-premise OpenStack with KVM, and -- coming soon -- VMware vCloud Air), so we're ideally positioned to support migrations into or between clouds. (The last big one we don't support yet is Azure, and we're currently planning how to tackle it.) Using our ability to run almost anywhere, the second cloud architecture I would recommend uses Delphix Replication.

Once you've created a golden copy of your data in Delphix at any location, you can use Delphix Replication to send an encrypted, compressed, incremental update stream of your data sources to a Delphix VM somewhere totally different. Once you've sent over the first copy of your data, sending incremental updates to keep the replica up to date happens automatically on a schedule you define.

hybrid cloud 5

Because it can heavily compress data (~4x usually, but it depends on the payload) and uses filesystem diff calculation to only send changed blocks (rather than complex database-specific backup protocols based on shipping all the transaction logs), replication uses less bandwidth to move your data, which can cut down on the cost and time of moving data between clouds.

Furthermore, the way we built replication guarantees that the replication source will never get so far ahead of the replication target that replication can't finish (which is not a claim that most database-specific backup protocols can make), so using replication to copy high change rate data over a low bandwidth connection usually works fine.

Finally, if your connection between clouds is shaky or you need to pause replication to open up some bandwidth for another consumer temporarily, replication can be paused and resumed with a button click. These properties actually make replication a great way to migrate from your on-premise datacenter into a public cloud as well:

hybrid cloud 6

There are even some more added bonuses for Delphix Replication:

  • Replication can encrypt all the data it sends over the wire, and we're currently adding functionality to replication to traverse more complex network boundaries such as SOCKS proxies. This means you don't need to create a VPN to connect the cloud to your production data if you use replication. (It's still possible to use a VPN, of course -- but if you do, remember not to double-encrypt your data).

  • You can specify specific databases to replicate, or you can replicate everything.

  • You can replicate one-to-many, many-to-one, or even do cyclic replications, giving unparalleled flexibility for data movement and coordination. This could (for example) allow you to test multiple clouds at once to see where your costs would be minimized before committing to a full migration.

hybrid cloud 7

  • More speculatively, we're considering adding replication support for the "offline import / export" feature that many public clouds provide. This gives an extremely high latency, extremely high throughput "network connection". Basically, you write the data you want to transfer to a spare disk, you send the disk via FedEx to the public cloud provider, and the provider copies your data into the cloud and sends the disk back to you. Although this approach (known as a "sneakernet") sounds hilarious at first glance, we've seen multiple customer cases where extremely large data transfers would have taken significantly less time if they had used this approach.

In many ways, both the existing functionality of replication as well as the roadmap we have planned for it will make it a very compelling gateway into public clouds.

Hiding sensitive data

Using replication to move data into the public cloud opens up an important hybrid cloud use case which has to do with security. Sometimes you can't move confidential data into the cloud, but for cost reasons you would like to run your test and development workloads there.

To avoid leaving confidential data exposed, your only option is to mask the confidential pieces of data, for instance by replacing real credit card numbers in your database with fake ones, before moving anything into the cloud. Although a couple of products exist for masking data, they are complex to configure, don't provide insight into what databases have been masked after the fact, and provide no cloud migration capability.

Ultimately, usability constraints plus the lack of an obvious audit trail mean that few enterprises have deployed them at scale and even fewer have integrated them into a hybrid cloud data architecture. However, with million-user security breaches appearing in the news on a regular 3-6 month cadence, most companies admit that they should be using masking more broadly and that the ability to audit what data is masked and find all the places where data exists in unmasked form is critical to hardening their security practices.

My third recommended architecture involves our new data masking product. With a Delphix VM sitting inside your secure on-premise network, you can create a golden backup of an unmasked database into Delphix, then create a virtual database. Once you have a virtual database, you can use our masking solution to hide any sensitive data. The first time you do this, it requires application-specific knowledge (i.e. which columns of which tables are considered sensitive, and which algorithm makes the most sense for masking the data) and therefore manual setup.

However, after you've configured it once, as the data in the golden backup is updated from production you can refresh the virtual database at will and kick off new masking runs on the up-to-date data without any additional configuration.

hybrid cloud 8

(We can already automatically refresh the virtual copy from production data on a schedule you define, but today you still need to manually re-run the masking job after each refresh. We're currently working on a project to automatically mask the data every time it's refreshed from production, which should make this use case super easy.

Today, our professional services group can work with you to provide similar functionality.) Once you have the data in a masked state, you can link the masked virtual database into Delphix in the cloud, where you can create as many virtual copies as you like from it with no additional security risk:

hybrid cloud 9

Very soon, we plan to streamline this by allowing you to replicate only the virtual database into the cloud. What I've described above is the approximate limit of what our masking product does today. However, our vision for how masking fits into data management is much more ambitious, and there are lots of new use cases that we're actively working on to make it even better.

For instance, one that we're designing today is how to present you with a worldwide view of what data has been masked and where it's been copied, allowing you to easily audit all the data in your applications from one place. This will obviously help you surface problems, but it will also help you fix them because then you'll be able to revoke access to the unmasked data, fix any incorrect masking policies, and push newly-masked copies to all the places where the sensitive data was previously visible.

We're really excited to blow the doors off of the traditional masking market, so stay tuned for more updates!

Overcoming hardware incompatibility

The final use case I want to cover is a less common reason why enterprises have a hard time migrating into the cloud, but (speaking as an engineer) it's the coolest use case from a technical perspective and is entirely unique to Delphix. In the rarer case where public cloud migration is blocked on hardware-level incompatibilities, the issue is often that an Oracle database is running on a platform like HP-UX, AIX, or Solaris/SPARC that can't be moved to a public cloud because virtual x86 hardware is the only platform most cloud vendors allow. SAP ASE is in a similar situation since it's frequently run on AIX, but we don't yet support data platforms other than Oracle for this feature.

Most organizations would probably love to cut their vendor dependency and migrate to commodity x86 boxes running Linux, but if you've ever done it before, you know that actually performing these database translations is akin to doing open-heart surgery while blindfolded. Oracle provides a playbook for it, but from our experience, the chances that you'll find yourself in an unanticipated bind halfway through seem to be about 95%.

As far as I know, there are zero solutions on the market (short of hiring a team of IT consultants) that come close to addressing this problem. However, much of this complexity comes back to what we're best at: deeply understanding the features and limitations of a complex data source and building robust automation that makes the workflow painless.

We've automated so much of the process that many customer Unix-to-Linux (U2L) translations using Delphix have worked without a hitch. However, it's a very complicated workflow which frequently can't be done without some amount of human intervention. Luckily, even for translations which required application-level knowledge (and therefore human intervention) to fix issues on the Unix database, we structured the U2L feature in such a way that:

  1. We tell you what went wrong and suggest how to fix it.

  2. It's easy to iteratively rerun U2L with newly fixed data.

  3. Once you get it working a single time, it can be scheduled to automatically repeat as often as you like on the most recent production data.

hybrid cloud 10

Beyond holding your hand through the blindfolded open-heart surgery, we also do some pretty slick stuff behind the scenes to keep the storage footprint of this operation low. Because the Unix platforms mentioned above run on big-endian processors, the bytes that are stored on disk will be in the wrong order in many, many places compared to what you would have on Linux/x86, which is little-endian.

To deal with this, when we receive transformed data from the Linux target system, we only record where the endianness swaps are located and any differences which cannot be accounted for by endianness. By doing this (plus some compression) we can frequently store the endianness data in the filesystem metadata rather than taking up new data blocks, allowing us to store the transformed data in about 1% of the storage space used by the original data. Using U2L, we can translate data that seemed impossible to migrate into a form that's easy to move into the cloud:

hybrid cloud 11

One remaining area to improve on here is that because of the way that Oracle works, U2L takes time proportional to the size of your database to run, so doing the Linux database will be behind your Unix database by that amount of time.

This makes U2L perfect for running production Unix instances on-site and Linux test/dev instances in the cloud with refreshes from production once a day, but if your ultimate goal is to move the production instance into the cloud, you would have to take downtime to do the final U2L (followed by a V2P). Incremental U2L is an extremely difficult final yard to get right (one that we've investigated but not pursued yet), but because a one-time downtime is worth the savings of cutting vendor dependence, this has still been a very valuable option for many customers.

Conclusion

The excitement around moving into the public cloud has taken a while to reach established enterprises, but now that they've got the bug, everyone wants to migrate to cut costs and organizational inefficiency. Delphix already has a variety of completely unique ways that it can cut down the barrier to entry, whether it's as simple as automating data movement and reducing the amount of data to migrate, or as complex as masking data and translating Oracle databases from legacy platforms to Linux/x86. We hope our customers will take advantage of these awesome features to make their cloud migrations cheaper, faster, and easier.