DataOps 2020: 5 Best Practices for Test Data Management in Multi-Cloud AppDev
Application development is moving at lightning speed in today’s model of continuous innovation. At the same time, more businesses are adopting the cloud for flexible capacity and greater operational efficiency to speed up the pace at which they bring new products and services to market.
Application teams leverage the flexible capacity and operational efficiency of the cloud to increase agility. Plus, experiences with cloud-based lower-level environments are an opportunity to re-architect IT processes and establish security practices, increasing knowledge and confidence when it’s time to migrate production workloads.
The models for appdev in the cloud vary. A hybrid-cloud model maintains production workloads on-prem, while non-production development and test environments are in the cloud. Cloud-first application teams can be located within a single cloud service or expand into a multi-cloud framework, allowing for a best of breed strategy based on a specific workload, geographic region, or cost structure.
Whether application teams are working in single, hybrid, or multi-cloud models, DevOps practices are often evaluated to avoid processes that add complexity and overhead, and the same should be true for the data pipeline that feeds the release train.
The emerging practice of DataOps focuses on fast and secure movement of data—think DevOps for data. DataOps has modernized test data management, eliminating long-standing wait-states that limit release velocity. Here are 5 best practices for DataOps to increase efficiency of CI/CD workflows within and across clouds and build better software faster.
1. Automate Data Delivery
DevOps teams can quickly spin up and down cloud-based test environments as they iterate on new code. Validating every change increases the speed of integration into master, and feature branches can be quickly retired. But agility is lost when test data delivery doesn’t match this optimized model.
Fast moving release trains get stuck waiting on serial ticketing and manual operations to deliver data into non-production environments. Research shows that 80 percent of enterprises in North America take four days or more to provision test data.
Automating data delivery into the CI/CD toolchain breaks the data bottleneck, so continuous integration can truly scale.
In a multi-cloud model, this codification of data must remain cloud-agnostic to enhance portability. Creating shareable code eliminates the need to tweak logic for cross-cloud integration testing and deployments.
2. Use Production-Quality Data
DevOps teams leverage small batch sizes to increase agility and maintain a tighter feedback loop. Shifting left keeps defects from moving down the pipeline, where they get harder and more expensive to triage. To find issues sooner in the SDLC, the spectrum of test environments used should closely simulate production, including data.
Out of convenience, developers often use synthetic data or subsets for testing—but that significantly weakens results. Data should reflect the production instance to ensure comprehensive test coverage and improve software quality.
3. Version Control Data
Destructive testing requires datasets to be returned to the original state, so tests can resume. Given the frequency that this occurs in test-driven development, delays in restoration creates yet another bottleneck for the CI/CD pipeline. Treating your data like code solves this problem.
Version controlling data creates a reference point in time, so data can be automatically rolled back to the original state during testing or when reproducing errors at a later date. Linking the state of the test database to specific application changes increases the flow of planned work because data becomes as agile as your code.
4. Centralize Access Control
Enterprises leverage diverse data sources tailored to various applications that are located in an equally diverse set of environments, on-premise and across clouds. Siloed procedures for provisioning test data have caused organizations to become “data blind”—meaning they don’t have a clear picture of what data they have and who has access to it.
Centralizing governance brings visibility and standardized control of who has access to what data, when, and for how long—no matter where environments are located. As with automated data delivery, permissioning of test data should not include cloud-specific logic. Cloud-agnostic controls result in policy-based processes that span cloud providers, and infrastructure-wide administration creates traceability for audits and reporting.
5. Mask Sensitive Data
Non-production environments are often less secure, simply for reasons of cost and convenience, which heightens the risk of sensitive data exposure. A centralized strategy to identify and safeguard sensitive data is essential to create a consistent line of defense across clouds.
Protecting PII information requires data obfuscation prior to distribution into lower-level environments. The method for anonymizing data should both remove sensitive information as well as ensure data still behaves like production data for testing purposes.
Encryption is common, but this removes logical relationships between database tables which in turn limits test coverage.
On the contrary, data masking replaces real data with fictitious but realistic data to maintain its referential integrity during testing. Masked data is not useful to a hacker and ensures non-production environments maintain compliance with privacy laws, such as the GDPR and CCPA.
DataOps practices bring mobility and security to the test data pipeline. Creating a cloud-agnostic test data delivery and management strategy—no matter where your production or non-production environments are located—will build efficiency into your CI/CD pipeline, mitigate risk of sensitive data exposure in lower-level environments, and future-proof your software testing as your cloud model evolves.