Data Compliance

Why Policy-Driven Data Obfuscation Should be the Cornerstone of Your Enterprise Data Security Strategy

IT teams are oftentimes faced with complex challenges regarding enterprise-wide data security, but a clearly articulated, policy-driven data masking technique can ease implementation and reduce cost.

Robert Patten

Mar 28, 2019

One of the most common security issues that enterprise software teams overlook is securing their non-production environments, which is where the majority of sensitive data lies within an enterprise.

Non-production environments is where software teams develop and test application changes using copies of real customer and company data, and the sheer number and volume of these lower environments is far larger than the highly visible production environments. For every production instance of an application, there are at least 10 copies of non-production. Put simply, it’s the larger part of IT that is not visible to the rest of the world.

As a result, the risk of data loss increases with every piece of sensitive data that makes it outside the production zone.

production iceberg

Why Data Masking is the De Facto Standard for Removing Sensitive Data

Data masking is typically done while provisioning non-production environments, so the copies of data created to support test and development purposes are not exposing sensitive information.

Unlike encryption, homegrown scripts or even synthetic data, an advanced, powerful data masking technology can do the following:

  • Automatically identify sensitive data;

  • Irreversibly protect the data from restoring to its original, sensitive state ;

  • Make testing feasible with realistic, but fictitious data while providing zero value to thieves and hackers;

  • Extensibility & flexibility features allow businesses to customize their solution for a wide variety of data sources they depend on;

  • Preserve referential integrity for important data relationships.

Here are three aspects of an optimal policy-driven approach to data obfuscation to safeguard your most sensitive data.

1. Prioritized List of Sensitive Data Types

First and foremost, the first step in securing an organization’s most sensitive data is to understand what and where that data lies across the enterprise. From a program perspective, automating data discovery - which we refer as profiling - provides a consistent method of identifying sensitive data across the organization and enables consistency through various algorithms.

Unlike many other data platforms in the market, Delphix uses what we call profile sets to define what type of data you might consider sensitive and would like to identify throughout all of the various data sources and environments. It’s specifically designed to locate and identify where sensitive data resides within complex tables and fields to help save time and effort, ultimately speeding up implementation.

2. Standard Method That Can Be Used Across the Enterprise

The next question to be answered is how. You can achieve a reliable and consistent set of masked values by standardizing a method that transforms the identified sensitive data types. If data integration is important between applications, choose a method that is deterministic, so content as well as data format is consistent.

3. Sufficient Resources to Build, Execute, and Support the Plan

The infrastructure team will play a key role within the organization to implement the tools and processes that will be used. Establishing this resource is critical to the overall success of the project as it will support all applications on-boarded to the masking platform. Additionally, deep knowledge of obfuscation tools and techniques and data storage technologies will also be required from the team.

The other option is to have a distributed data obfuscation program. This model is the simplest to implement as each business line is given the directive to obfuscate data in all lower environments but involves little guidance or tools to standardize on. This method results in faster implementation but forges siloed processes that rarely support integration. Organizations can decide to take this approach as a short-term solution and retool to meet long term goals.

How it Works with Delphix

Determining what and how you’re masking your sensitive data are the most important questions to kick off the program. The next step is to build your profile set, which is a grouping of search expressions used to search your databases to identify columns and fields containing sensitive data. The search can be performed using metadata or by sampling the table or file data.

Column level metadata search should be performed as the primary search method since the result will be returned much faster. If your schemas employ uncommon column names, it will be necessary to perform a data level scan. It’s not uncommon to employ both techniques.

When using the Delphix profiler, a successful scan results in assignment of a domain (PII type) and algorithm (method). It employs Java regular expressions to find sensitive data by scanning columns names and if necessary, column data contained in the database. Delphix masking provides over 50 out-of-the-box search expressions ready to use for profiling. For each expression, you will see a domain (data type), expression name, expression level, and expression text. The expression level determines whether the profiler will search through column names in the schema or data within the tables.

From there, the domain ties the search expression to an algorithm. The masking security policy is implemented by grouping the search expressions into the profiler set, and these constructs that implement the security policy within Delphix Masking are managed in the settings tab of the UI.

masking spreadsheet

The Delphix Masking Engine is pre-configured with two common use case profiler sets: financial and healthcare (HIPAA). These profiler sets contain a superset of common domains for each use case. I recommend that you review the expression/domain that are included in these profiler sets as an example of common data types to be included in your masking security policy.

It’s best to start with a smaller set (10 or fewer) of your most sensitive domains and evaluate the inventories produced. The profiler can be run again and again with additional expressions to rebuild the masking inventory.

Once the profiler has run, the inventory can be evaluated to make sure sensitive columns have been identified and the correct algorithm assigned. This evaluation is best performed by exporting the inventory to a CSV file (export button in inventory tab) and seeking feedback from the application team.

policy spreadsheet

This spreadsheet (below) along with the masking policy spreadsheet above will provide the application SME with the “what” and “how.” An initial check by the application team and resulting feedback can save time during the masking process.

spreadsheet 2

Sample Inventory Export

Closing Thoughts

Having the ability to automate the discovery of sensitive data, mask that data and distribute it quickly and securely to both internal and external stakeholders will be key in mitigating risk within your enterprise, rather than locking down that data to protect it from unauthorized access. Not to mention, a policy-driven masking program can speed up internal adoption. A DataOps platform, like Delphix, can help dramatically reduce the complexities of data management and deliver business impact by empowering teams to build high-quality applications and stay compliant with privacy regulations.

Watch this demo and learn how to transform sensitive data for compliance and secure with the Delphix DevOps Data Platform.