Minding the [Data] Gap

In this blog Adam describes the data perils we have learned to live with in our SDLC and where Delphix can specifically address them.

Adam Bowen

Aug 25, 2016

I am fortunate enough to find myself in London, England once again this year. If you have been to London and have ridden "the tube," then you are familiar with the phrase "Please mind the gap." For those who may be unfamiliar with this phrase, it is repeated at every stop on the train/subway to remind departing passengers to not step in the space between the train and the sidewalk. And, like most constantly repeated sound advice, we tend to hear it the first time, and then drown it out. And, true to form, ignoring that advice usually comes back to bite us in the end. This is what almost happened to me today as "the gap" was twice as big as it normally is. I have never been so thankful to have such large feet.

The events played over and over again through my mind on the remainder of my journey back to the hotel. And then the thought hit me: this is exactly what happens in our SDLC (though often with a more unfortunate outcome). We have learned to live with the peril of old, stale, subsetted, or purely synthetic data (the data gap) in our day-to-day lives and completely forget about its presence...until it is much bigger than we assume and almost kills us (or in the least causes us some embarrassment and bruises).

We have acknowledged the data gap in our SDLC and have managed to just work around it ... that is, until we don't. All of us have experienced injury from the data gap in our projects. Here are some typical injuries:

  • We plan for the two week database provision time, but then it takes 4 weeks. Project delay and cost overrun.

  • We plan for three days for a database refresh, but it takes 5 days. Teams waiting, features drop. testing cycles drop.

  • We don't plan refreshes, so our projects don't suffer downtime; but the six week/month-old data caused us to miss detecting a P1 defect.

  • We program back-out scripts/steps to reset our dev/test environments to avoid 5 day refreshes; but they unknowingly fail, introducing bugs and productivity loss.

  • We don't mask non-prod copies, because masking is hard and takes too long. Dev gets compromised.

  • We just run pure synthetic data in non-prod but we miss corner-cases; introducing bugs into late-cycle dev or into Prod.

There are even more data gap pains we have all faced around processes like subsetting and break fix activities. Just like in my tube experience, we knew the gaps were there. In fact, we counted on the gap to be there, but in those moments the gaps were far larger than we planned. We planned to march forward with our data in place, but instead we plunged into the abyss.

While Delphix can't heal every peril in your SDLC, let's examine just a few of the places where Delphix can remediate:

Provisioning new data


Today, if you are like most traditional shops, you wait days or weeks to get new environments, and additional days/weeks to get those new environments provisioned with data. If you are a more modern DevOps/Automation shop, you can get environments in minutes, but you still wait hours or days for data. After all, even if you automate the request, copying/transferring 60TB of data only happens so fast (thanks, physics). With Delphix, you can eliminate the words "days", "weeks", and "hours" as descriptors for waiting for data. Yes, that is even for a 60TB database. This can either be done ad hoc by the developer/tester/DBA via the Delphix self-service tools, or can be integrated right into your automation/DevOps processes with very little effort.

In the below diagram, I depict a situation where you are already using configuration automation, such as Ansible, Puppet, Chef, or Salt Stack to build your infrastructure and supporting applications. In this case, you can easily tell those tools to automatically call Delphix to provision the data after the infrastructure is in a ready-state.

combined

Flow diagram of provisioning data with and without Delphix

Refreshing Data


The constraints that afflict data provisioning in your environment likely afflict data refreshes in your environment, though in some cases the constraints may be somewhat lessened (days, instead of weeks). The same technology that Delphix uses to provision environments can also be applied to refreshes. That means that refreshes take the same amount of seconds/minutes that it took to provision the first copy. The same self-service and automation capabilities that were available to provision, are also available to refresh. Also, Delphix stays in near real-time sync with production. That means you can refresh your non-prod copy from 3-seconds-old production in just a few minutes time, at will. In the time that it would have normally took you to shoot your friendly DBA an email to request the refresh, you could already have the data. How does that impact your project timelines? If every time you do a pull from git, or trigger a commit gate on TFS, etc. it automatically refreshes your database (including applying any DDL/DML that needs to occur), how does that affect your quality?

The below diagram depicts a real account of one of our Wall Street financial customers. Because production data was cumbersome to deliver to non-prod, development would occur on months old environments. Changes to production occurred outside of development, courtesy of hot fixes, etc. Over time, this would add more and more inconsistencies between production and development data which resulted in more and more bugs making it to production. Routinely refreshed data in development results in more defects being fixed early in the SDLC where they are easier to fix. Here I show refreshes happening on a weekly schedule, but they could be set to any interval or trigger by some other tool such as a git hook.

refresh

Flow diagram of provisioning data with and without Delphix

Resetting Data


Some tests are destructive by intentional design, and some tests are unintentionally destructive. In either case, you require a way to get be to a "test-ready" state. That really leaves only a couple of choices: either refresh the data, or back out of the changes. But, backing out of the changes implies a couple of very important constants. First, you have to be aware that changes were made to your data. If your development or tests were not designed to be destructive, are you even scrutinizing that Field A2354 on Form 234 now points to a different column in table XYZ? You simply don't know what you don't know.

But, if you are running intentionally destructive tests, are you sure you are backing out of all the changes? How much time and energy are you spending on your back-out/reset procedures? Do you subject those scripts/procedures to the same level of QA as the application you are developing? If you are, I commend you. But, there is still a better way. Once your non-prod environments are virtualized in Delphix, you can have crash-consistent copies of your applications that are as easy to access as rewinding as a movie on Netflix, or flipping pages on your Kindle. You have already provisioned your data with Delphix in minutes. You do some development that did not yield the results you wanted. Just click "Rewind" to go back to the point in time you want. This can either be a literal timestamp, or something more canonical, like a bookmark titled "Step 5 complete." This process takes just about as long as it takes to restart your application/database. If you no longer have to develop, test, and maintain reset scripts, and the reset happens in minutes, what productivity and quality gains are delivered to your projects?

In the diagram below, I have depicted a typical process where you are testing the application of package updates to a composite application with multiple data sources or an ERP system, like SAP. In a traditional test, if you are applying a series of SAP packages and one fails catastrophically, you likely have to wipe and start from scratch. This process takes weeks. Our customers that use Delphix for SAP are able to revert the last successful step in minutes and are ready to resume their testing with the click of a button.

rewind

Flow diagram of resetting test environments with and without Delphix

Data Masking and Anonymization


Security is paramount to protecting our businesses, missions, patients, and consumers. Non-production copies, with few exceptions, should never contain sensitive data. I know that we all know this; yet we all have worked (or are working) somewhere where banking/patient/customer information was strewn about in many places. If masking was easy, everyone would do it, everywhere, all the time. With Delphix, masking is easy. Furthermore, with Delphix, Agile Masking for non-prod copies can be automated eliminating the potential for a process breakdown whereby a developer gets an unmasked copy of production. Leveraging role based access control, every time a developer clicks "provision," "refresh," or "rewind," his request is supplied from a pre-masked copy of production. Yes, pre-masked. So, the tax has already been paid for that 8 hour masking job by the time your developers get into the office at 8AM, and they have fresh masked data available from the previous day's close. Delphix Agile Masking is easy to setup and use, requires no programming expertise, and can even analyze your data for possible sensitive information. With the complexity and time constraints removed from masking, how can you afford to not mask anymore?

In the diagram below, I show a typical process where a new copy of masked data is requested and the time and manual touch points that it takes before the data is delivered. In the Delphix scenario, security can establish and review a masking policy that is automatically applied by Delphix. Delphix automatically updates with a masked copy of production on a specified interval. At any time, and without impacting the data delivery chain, security can review any of the automatically masked copies to ensure compliance and satisfy audits. The requestor only has access to request data from the certified masked copy and can get it delivered via self-service in minutes. This application of masked data delivery can be applied to any of the above scenarios I described, as well.

masking

Flow diagram of masking data with and without Delphix

These are just a few of the scenarios where Delphix can be inserted in your SDLC. I have previously blogged about our customers that leverage Jenkins or SNOW Orchestration as orchestration tools to call Delphix provisioning to complete their CI pipeline. They key point is to look at your SDLC and identify points where you are waiting. If you are waiting, it is likely for data. If it is indeed data for which you are waiting, then Delphix can help. Delphix is Data Delivered.