Data Compliance

Automation is Your Secret Weapon to Fast, Accurate Discovery of Sensitive Data

As data continues to grow at an accelerated pace and become more heterogeneous, learn why automation is key to discovering sensitive information, quickly and accurately.

Alan Bitterman

Feb 06, 2019

The first step in securing an organization's most sensitive data is to understand what and where that data lies across the enterprise. As data continues to grow at an accelerated pace and becomes more heterogeneous, the key to discovering that information is to do it in an automated fashion. But today’s wide variety of databases, data sources, formats, and applications only increases the complexity and difficulty to do so.

How it All Works: Manual vs Automatic

Manual identification of sensitive data across multiple environments is slow and inaccurate. Teams are required to get involved in every step of the way for every environment, and without the ability to repeat through an automated process, profiling cannot be performed periodically as data changes over time.

Delphix, on the other hand, takes the approach of providing a powerful, out-of-the-box yet flexible framework for identifying your most sensitive data. The platform uses what we call profile sets to define what type of data you, as a user, consider sensitive and would like to identify throughout all of the various data sources and environments.

masking

Profile sets are made up of a number of profile expressions (REGEX) to scan both the column names (metadata) as well as a sample of the actual data to look for data patterns, including credit card numbers, social security numbers and telephone numbers, among many others. These expressions have been tested and validated across many engagements with Fortune 500 companies with a large, complex data portfolio of databases.

A great feature of Delphix’s data discovery is that the profile expressions of a profile set are directly mapped to algorithms. The profile discovery data can then be used immediately for masking that data.

For example, our out-of-the-box profile sets align to specific applications, including SAP and PeopleSoft, in addition to ones for specific regulations, such as HIPAA, PCI and more. Our team has designed a way to locate and identify where sensitive data resides within complex tables and flag specific fields. In short, this process can help save the effort and time, so you can speed up implementation and feel confident about complying with regulations. Customers can modify profile sets and/or create their own as required and use the respective profile sets for the discovery of sensitive data.

expressions

Profile sets are also a critical element for identifying the correct data patterns required for security exposure and masking algorithm requirements. This is an area where time and investment of resources are needed to ensure the data structures that represent the business’ sensitive data are properly defined and represented in the desired profile set.

Profiling One Environment

Let’s jump right into profiling one database through the GUI using the steps below.

steps

After creating the required environment objects, a profile job is then created, tying the rule set and the selected profile set together to perform the profiling. A rule set is a group of flat files (or tables for databases) within a particular data source (which you have connected to by creating a connector) that a user may choose to run profile, masking or tokenization jobs on.

profiling job

Once the profile job is completed, the results are shown for the rule set in the inventory page.

rs1

The discovered sensitive data and respective domain/algorithm can be exported to a spreadsheet as required for documentation or further analysis. Here's an additional step-by-step guide to help you get through your first set up for profiling.

The Bigger Challenge: "That's great for one environment, but we have thousands!"

While profiling one data source/environment is fairly straightforward and simple, what if you have hundreds, if not thousands, of environments? That's where the Delphix masking REST APIs can be used to fully automate profiling across thousands of environments. You can learn more about our APIs here.

masking api

Delphix provides an API Utility UI that provides an interactive way to learn the individual APIs as well as the URL and inbound and outbound JSON body content. With the APIs comes the coding. The basic logic that was executed via the web application user interface can now be programmed.

To automate profiling, Delphix provides a number of open source repositories for working with the Masking APIs, including dmx-toolkit and dxapikit, which include basic shell scripts examples to encourage customers to learn, understand and get up to speed quickly with the Delphix APIs. An example of profiling a set of scripts is available in the “dxapikit” repository, and you can also download the repository here.

Let’s review the logical flow of the profile.sh script in dxapikit and the calls to the other scripts.

schema

The script takes the profile databases connection information and then performs all the manual steps shown earlier via the Masking APIs and exports the results to static HTML files. The code provides different options for providing the connection string information and respective profile set to use. Here’s a sample connection string information (CSV) file.

excel

The file also includes a parallel option. To improve the performance, this option will split the connections into equal number of connections per parallel job. The script then launches subsequent batch.sh scripts and waits for all the scripts to be completed before writing the final HTML report page (report.html).

Sample Results

Here’s the summary page with links for each source profiling results.

scan results

This individual page provides a detailed view of the results along with the ability to download the results into a CSV file.

scan results 2

Each page shows which database/column was identified with sensitive data, isMasked value, and also the respective profile set mapping to the Delphix Masking domainName and algorithmName values. To demonstrate future potential functionality, a different report is included ONLY for the first two results.

scan results 3scan results 4

Learn more about how we can help your organization stay compliant with regulations, meet cloud mandates with less risk and protect sensitive data from unauthorized access with Delphix masking technology.