DevOps for Data

Applications are the lifeblood of the modern enterprise.

Applications are the lifeblood of the modern enterprise. The business demands placed on the responsible teams seem daunting: deliver more applications with better features, in less time and at lower cost. Founded on the convergence of development, IT, and operations, the DevOps movement is facing these demands head on. The essentials are simple: by engaging IT directly in the development process and involving development in the management of production operations, projects move more quickly and with higher quality. The goal is to achieve a pace and confidence that allows projects to be rolled out continuously: Flickr famously burst onto the DevOps scene at Velocity 2009 with their presentation that the business required (and delivered) 10 production deployments a day. This tight application lifecycle accelerates and streamlines the business, delivering greater value at lower cost. Or, in the word of the Flickr team:

Ops' job is not to keep the site stable and fast, ops' job is to enable the business. The business requires change.

In the traditional application development lifecycle, there lies a chasm between QA and production all projects must cross. Up to this point, developers and IT have made assumptions about the nature of the production environment based on mutually agreed upon specifications. But these assumptions are never fully validated until they converge in production. Many well planned and executed projects fall victim to schedule delays due to the gradual erosion between specification and reality. DevOps seeks to eliminate this chasm by managing environments like the application code -- repeatable, available, and reliable across the application lifecycle. As Gene Kim, author of the seminal Visible Ops book, said in his keynote at PuppetConf 2012:

At the end of each sprint, we must have working code and the environment it runs in!

In response, tools like puppet, chef, and vagrant have emerged as dominant platforms for creating and managing complete environments in a repeatable fashion. Instead of relying on hand crafted systems prepared by the IT department, developers work with IT to create recipes for environment instantiation and integrate those recipes with the application code. This enables developers to create their own sandboxes during development, and by using the same recipes in production, many of the discrepancies that lead to the application lifecycle chasm are eliminated. Underpinning these tools is the philosophy that developer environments must be as "real" as possible, not just in configuration but in terms of scale and content. Real environments expose real bugs that might otherwise only be seen in production; the earlier bugs are caught in the application lifecycle, the less they cost and the less impact they have on the project schedule. So what happens when your environment includes data? As Robert Treat from OmniTI said in his talk  on DevOps for databases:

Databases are fundamentally different: they have data.

As obvious as it seems, the distinction is important. Data drives application behavior - an environment without data is an empty shell ripe for false assumptions. Tools like flyway and liquibase provide a framework for continuous data development, but rely on synthetic data. Data subsetting can create more lightweight copies, but the result can hide bugs that are only encountered at scale. Sharing full production copies minimizes overhead, but makes it difficult to coordinate refresh and destructive testing. These data anchors force DevOps into an unappealing conundrum: abandon the desire to work with real data during development, or turn to expensive manual management of database copies that increase cost and slow the application cycle. Both narrow, but don't eliminate, the application development chasm, jeopardizing application quality and schedules. With Delphix, DevOps no longer has to choose between quality and agility. The Delphix agile data platform provides fresh complete production data sets that can be created, refreshed, and reset directly by developers. Through stable public APIs, tools like chef and puppet can integrate with Delphix and provide automated deployment of virtual databases within any environment, virtual or physical. With real data at their fingertips, developers can find and fix bugs that were previously only caught through expensive rollout of production or UAT environments. So what might Delphix look like in a DevOps setting?

  1. IT configures Delphix Engines to non-destructively link to a production database and provide continuous change capture for data provisioning and refresh.
  2. IT works with development to create Delphix workflows for the project and integrate with provisioning frameworks like chef and puppet, allowing VDBs to be easily provisioned in shared or dedicated environments, virtual or physical.
  3. These recipes are integrated with the code and managed through tools like vagrant to allow developer creation and refresh complete application environments.
  4. For UAT and performance environments that must mimic precise hardware configurations, V2P can be used to instantiate a physical VDB from a Delphix source.
  5. In production, environments safely refer to production data, knowing that every stage of development and QA has been validated with recent full copies of identical data.

By working together to define and integrate project workflows, IT and development intrinsically validate their shared understanding of project data requirements at the outset. The Delphix agile data platform efficiently provisions full fresh data in copies in a fraction of the space. Integrated with automation tools, developers get direct control of their data to suit their needs without burdening IT. The result is faster, higher quality projects built by a cross-functional team aligned from start to finish - DevOps for data.