Application Development

The Compliant Data Layer: A Sustainable Approach to Mitigating Sensitive Data Sprawl

A compliant data layer continuously synchronizes production data with data stores while replacing sensitive details with fictitious but realistic details to preserve the data’s utility and referential integrity

Todd Tucker

Apr 03, 2024

Protecting sensitive data is a constantly evolving uphill battle. The threat landscape is always changing while new vulnerabilities are discovered at an ever-increasing rate. Meanwhile, the accelerating sprawl of sensitive data in enterprise environments only complicates this challenge. 

Mitigating the risks of sensitive data sprawl commonly involves a multifaceted approach, combining technological solutions with robust policies and employee awareness. Some of the common strategies that organizations use to combat sensitive data sprawl include (but aren’t limited to) access controls and authorization, regular audits and compliance checks, secure data storage solutions, endpoint security, and vendor risk management.

Many of the aforementioned approaches to sensitive data risk mitigation fail for one simple reason: they must be scaled with the amount of sensitive data. Indeed, most measures mitigate the risk of sprawl but do little to prevent it in the first place.

Given the costs of scaling traditional approaches to mitigate the growing risks of sensitive data in non-production environments, a fundamentally different approach must be adopted. There is, however, one ideal approach to mitigating sensitive data sprawl: the Compliant Data Layer (CDL).

What is a Compliant Data Layer?

A CDL continuously synchronizes production data with data stores, such as SQL and NoSQL databases, semi-structured data repositories (e.g., XML, JSON), data warehouses, and data lakes, while replacing sensitive details such as personally identifiable information (PII), personal health information (PHI), and account numbers (e.g., payment card data) with fictitious but realistic details. 

When designed and implemented properly, this approach preserves the utility of the data (i.e., the data’s value for use cases such as development, testing, and model training) and the referential integrity of the data across the multitude of data stores, even after the data is masked. A CDL continuously synchronizes and masks to maintain data freshness and compliance with privacy regulations and policies. A CDL also functions transparently to the applications that depend on the resulting compliant data stores. 

A CDL replaces the inefficient data replication strategies that are typically used to support software development, testing, incident resolution, analytics and AI model training. It removes the traditional bottlenecks in provisioning data to users in non-production or lower-tier environments and eliminates the unnecessary copying (sprawl) of sensitive data. As a result, a CDL improves productivity, reduces cost and labor, and mitigates sensitive data risks. A CDL is an increasingly essential component of data architectures supporting DevOps, CI/CD, AI, and other modern approaches to value delivery.

Because of this design, a CDL follows user-centered security design principles. Rather than creating a more difficult path for developers, testers, analysts and other data consumers, a CDL makes it easier to obtain high-quality data that is secure by default. Users are no longer tempted to circumvent security measures but are instead given a seamless approach that reduces friction in their everyday processes.

For a CDL to be effective, the following key features must be present:

  • Connections to all relevant enterprise application data stores, including those in SaaS, PaaS, IaaS, and on-premises;

  • Automatic discovery of all regulated, sensitive data in those data stores;

  • Automatic sensitive data masking (anonymization) that preserves the utility of data for the necessary use cases;

  • Maintenance of data relationships to preserve referential integrity after data masking, even across multicloud data sources;

  • Automated data provisioning and masking throughout the application lifecycle to ensure secure chain of custody; and

  • Integration with user toolchains and pipelines such as those of DevOps and ITSM systems.

When implemented properly, a CDL shifts risk mitigation efforts from a model that must scale to match the sprawl of sensitive data to one that eliminates the sprawl, all while supporting high-velocity development, democratized analytics, and distributed AI model training.

Real-World Experiences with Delphix

Delphix customers rarely describe the architectures they’ve implemented as a Compliant Data Layer; yet, many of them have established the features and capabilities described above because they are grappling with sensitive data sprawl. 

Consider Choice Hotels International. According to CIO Brian Kirkland, Choice Hotels’ innovation measures had, in the past, inadvertently resulted in many copies of sensitive databases proliferating across multiple non-production environments. 

“One of the biggest ways we're using Delphix is really in the masking technology. Our ability to really control and protect the data and make sure that it's secure and make sure that that people aren't making mistakes is paramount number one. We've got to make sure that we're doing the right thing when it comes to protecting PII data and PCI data and make sure that our environments are clean. We use the masking in order to protect those lower environments across all of our assets.”

Or consider StoneX, a Fortune 500 financial services company. According to Anup Anand, Global Head of Infrastructure & Operations, the company uses Delphix to automate the data processes that are part of its development pipelines. In other words, with one view, StoneX is able to demonstrate the reduction of sensitive data sprawl: 

“We spent a lot of work building a CI/CD pipeline with a focus always on security. We have automatic processes to detect sensitive data within our databases and make sure that we're applying the correct data scrubbing before provisioning to non-live environments. With one view, we can see where all of our non-live environment [are] and prove to auditors that we’re scrubbing data appropriately.” 

Finally, consider the world leader in Human Capital Management solutions, ADP. According to Vipul Nagrath, SVP of Product Development and Head of Technology, the company must deliver test data to many client teams. 

“Our past process of moving data around from one environment to another could be quite onerous at times. Maybe it would take a day, or many, many hours for small clients, but it might take a day or multiple days for very large clients. By utilizing Delphix, we copy in place, which actually saves us a lot of storage. But the other part is we are able to mask the data. So the sensitive data that I can't have every developer looking at, they're not seeing. Our time to market is now faster than it used to be in the past and it's higher quality.”

Time to Challenge the Status Quo

The accelerating growth in the number of non-production data environments and the need to protect sensitive data is forcing data security officers, enterprise architects, heads of application development and other leaders to rethink their approaches. The status quo of replicating sensitive data and then protecting it with traditional approaches is no longer tenable. A more modern approach is needed to significantly reduce or eliminate sensitive data sprawl, centralize control and protection, and satisfy users with production-quality data in their non-production environments. 

A CDL meets these challenges. The themes that Delphix customers share time and again revolve around masking data to reduce risk, accelerating application release cycles, getting to market faster with digital capabilities, and improving the productivity of development teams. All of this means less sensitive data sprawl and a well-managed data protection footprint.

Reach out to us to find out how your organization can leverage a CDL to mitigate sensitive data sprawl.