File Masking Made Simple With Delphix Masking APIs

Learn how the Delphix masking APIs allow you to create the necessary objects and easily perform masking jobs for a wide variety of file formats.

In today’s world of multiple data sources and the heavy burden of compliance, customers are looking for solutions that enable the business to operate while addressing security requirements. Sensitive data that need to be anonymized through masking is not only stored in databases, like Oracle or RDS, but also in file systems with a variety of different file types. 

Delphix supports a number of standard and out-of-the-box file formats, including delimited files (i.e., csv or tab), XML files, Copybook, and fixed-width file formats, while also enabling the data masking of other file types, like JSON, through pre/post masking steps.

The Process

Delphix masking technology has a logical flow to masking files that can be simplified into two phases: setup and execution. 

Delphix File Masking Flow
Delphix File Masking Flow


Using Delphix, the first phase of masking files is the setup process. Here are 6 high-level concepts that are important to understand before you get started: 

File formats: To mask any file, our process requires a file format definition (column names header) for each uniquely structured file. Then, the file format is assigned to the rule set that identifies the file(s) for masking. This has to be done for each unique file format, so defined file formats can be reused for all files that have the same structure.

Connectors: In order to access the data you wish to mask, you need to create a Connector. Connectors are any set of data (database or file) that has been connected to the Delphix Data Platform. These data sources can be physical or virtualized data sources. In the case of file masking for example, it may be the SFTP/FTP server where the files are stored.

Rule sets: A rule set is a group of flat files (or tables for databases) within a particular data source (which you have connected to by creating a Connector) that a user may choose to run profile, masking or tokenization jobs on.

Inventories: An inventory describes all of the data present in a particular data source and defines the methods, which will be used to secure it. Inventories typically include the file name, field name, the data classification and the chosen algorithm.

Masking jobs: A masking job is what you will set up to actually execute the masking of your files. When configuring your masking job, you must select the rulesets and inventories that you configured, which will tell the masking job how to do the masking based on which files and algorithms that were applied. 

Pre and post-processing scripts: If the file content is not 100 percent in one of our supported predefined formats, the file can be pre-processed into a working format, masked and then post-processed back into its original format by creating a wrapper script/program that calls the pre-processing code, masking job and post-processing code. You can upload or define the pre/post-processing scripts when creating your masking job. 


The next phase of this process has to do the actual execution. All you have to do is start the masking job(s) that are set up. These masking jobs will run any pre-scripts you defined, complete the masking transformations (based on the setup you completed in setup phase) and finally run any post-scripts you defined. At the end, you will have files that have been anonymized through masking. 

Automating with the API

While running the Delphix UI is a great way to get familiar with masking and do the initial setup, you will most likely want to automate the process. So how can we automate this process and simplify the file masking process? When you have a lot of files that need to be masked, you can do this with a click of a few buttons. You can check out this Delphix Masking APIs document for a brief overview. 

One of the tools we make available is our Masking API client portal as shown below. It provides an interactive way to learn the individual APIs, the URL as well as the inbound and outbound JSON body content.

Masking API

While we offer our APIs for users to develop their own automation/scripts around masking, we also provide a number of open source repositories for the masking APIs, including dmx-toolkit and dxapikit. These repositories provide basic shell script examples to help users learn quickly and get up to speed. Here’s one that involves automating file masking

The code/scripts in this repository take advantage of all the masking object creation APIs to define the file format, create a rule set, assign the domain/algorithm, set up a masking job and then run the job. The sample script requires an existing masking environment and connector, and the connector must contain the valid path for the file. 

The rest of the parameters define the source delimited file, the delimited column names (header information), the mapping of the column names to the masking domains/algorithms and if applicable, the file type and respective delimited parameters.

Some additional functionality that a user may want to add to these scripts to further automate the file masking process includes: 

  • Additional code for the front end pre-processing of the files to build the desired column header file formats and the mapping of the masking domains/algorithms
  • Automation for the connector creation and/or updating an existing connector to change the full file path

Delphix protects data wherever it resides, on premises or in the cloud. Learn more about how you can stay compliant with regulations, meet cloud mandates with less risk and protect sensitive data from unauthorized access with Delphix masking technology.

Suggested reading


Automation is Your Secret Weapon to Fast, Accurate Discovery of Sensitive Data

As data continues to grow at an accelerated pace and become more heterogeneous, learn why automation is key to discovering sensitive information, quickly and accurately.

Introduction to Secure Lookup, a Data Masking Algorithm

Unlike traditional encryption, most masking algorithms are designed to be irreversible, meaning they purposely destroy information so the original data is not retrievable from the masked dataset. Secure lookup is designed to mask data consistently but irreversibly.

Why Policy-Driven Data Obfuscation Should be the Cornerstone of Your Enterprise Data Security Strategy

IT teams are oftentimes faced with complex challenges regarding enterprise-wide data security, but a clearly articulated, policy-driven data masking technique can ease implementation and reduce cost.