Platform

Run a Masking Job Anywhere with Delphix

Learn how Delphix enables masking at scale by allowing you to pick up a masking job, sync and run it anywhere you have Delphix Engine(s) deployed - whether it’s on-premise or in the cloud on AWS, Azure and more.

Alexandros Mathopoulos

Dec 18, 2018

As the amount of data that organizations handle grows exponentially, so does the amount of sensitive data that needs to be protected and handled responsibly. For example, some of the largest banks in the world have petabytes of data, and much of that is sensitive data spread across many data centers and the cloud.

Data masking technology provides a way to anonymize and desensitize that data while ensuring that it remains meaningful for developers, testers, business analysts, and others who need data to drive innovation. Furthermore, as data sprawls across a variety of platforms and the rate at growth increases, your data masking solution must be able scale.

Introduced in our latest product update, our data platform now enables masking at scale by allowing you to pick up a masking job, sync and run it anywhere you have Delphix Engine(s) deployed - whether it’s on-premise or in the cloud on AWS, Azure and more. Our API facilitates the synchronization of information that defines masking jobs across multiple masking engines.

Engine synchronization provides a flexible way to move these masking objects, also known as the algorithms and related information associated with a masking job, necessary to run an identical job on another engine.

Scale Masking Across the Enterprise

There are two specific scenarios in which organizations can benefit from orchestration between multiple masking engines. The first involves a multi-engine implementation that addresses the problem of horizontal scale to achieve consistent masking across a large data estate by deploying multiple masking engines.

For many enterprise companies, the size of profiling and masking workloads requires more than one production masking engine. These masking engines can be identical in configuration or be partially equivalent depending on the organization's needs. Syncable objects are authored on one engine labeled under Control Masking Engine in the diagram below. Those objects are then distributed to Compute Masking Engines using engine synchronization APIs. The synchronized algorithms and masking jobs will produce the same masked output on all of the engines, thus enabling large data estates to be masked consistently.

scale masking

Develop Your Masking Rules like Software

This second architecture addresses the desire to author algorithms and masking jobs on one engine to test and certify them on another and deploy them to a production engine as part of your SDLC process.

Using a SDLC process often requires setting up multiple masking engines, each for a different part of the cycle (development, QA and production). Here, algorithms are authored on the first engine as labeled under Dev Engine in the diagram below. When the developer is satisfied, the algorithms are exported from the Dev Engine and imported to the QA Engine where they can be tested and certified. Lastly, they are exported from the _QA Engin_e and imported to the production engine.

develop masking rules

Getting Started

To synchronize masking jobs, all you have to do is use 3 API endpoints:

GET /syncable-objects[?object_type=\<type>]

This endpoint lists all objects in an engine that are syncable and can be exported. Any object that can be exported can also be imported into another engine. The endpoint takes an optional parameter to filter by a specific object type. Each object is listed with its revision_hash.

Note: If a syncable object depends on a non-syncable object (i.e. DOMAIN using a mapping algorithm), it will say so in the “revisionHash” attribute and will not be exportable.

Example CURL command:

code 1

POST /export

This endpoint allows you to export one or more objects in batch fashion. The result of the export is a document and a set of metadata that describes what was exported. From there, you’re  expected to specify which objects to export by copying their object identifiers from the /syncable-objects endpoint. The endpoint has a single optional header, a passphrase. Thus, the export document will be encrypted using the provided passphrase.

Example CURL command:

code 2

POST /import

Finally, this endpoint allows you to import a document exported from another engine. The response returns a list of objects that were imported and tells you whether the import was successful.

The endpoint has one required parameter, force_overwrite, two optional parameters environment_id and source_environment_id and an optional HTTP header, which if provided, will cause the engine to attempt to decrypt the document using the specified passphrase. The required force_overwrite parameter dictates how to deal with conflicting objects. Environment_id is necessary for all non-global objects that need to belong in an environment. Source_environment_id is used for On-The-Fly masking jobs.

Example CURL command:

code 3

Learn more about how the Delphix Data Platform provides an enterprise-wide approach to data masking and data virtualization capabilities that can help sync, mark and deliver your data securely and rapidly.