Data Masking Techniques: How Real Masking Solutions Work

Data masking replaces sensitive data with fictitious, but realistic data.

About Data Masking

"Data masking replaces sensitive data with fictitious, but realistic data." It's a seemingly simple definition for data masking, advanced by analysts, users, and vendors alike. But in fact, this common explanation belies a significant amount of complexity stemming from the variety of data masking techniques available to organizations seeking to protect confidential information. Some of these are true data masking techniques, while others are not. Let's try to unpack the concept of data masking by sorting through different data masking techniques while also delineating what separates masking from other approaches to data security.

Data Masking Techniques vs. Other Approaches

Data Integrity

Data security approaches can first be examined along a dimension that considers how well the solution preserves the usability of the data for non-production use cases such as application development, testing, or analytics. When the solution is applied to sensitive data, do the resulting values look, feel, and operate like the real thing?

True data masking techniques such as shuffling (randomly switching values within a column) or substitution (a given value is mapped to an equivalent value in a secure lookup table) transform confidential information while preserving the integrity of the data. On the other hand, nulling out values, character scrambling, or data redaction ("X-ing" out characters or full values) may render transformed datasets useless to an end user. For example, data validation checks built into front-end systems may reject nulled or redacted data, preventing testers from verifying application logic.


A second key characteristic that separates data masking techniques from alternative approaches is reversibility. Data masking techniques irreversibly transform data: Once data has been masked, the original values cannot be restored through a reverse engineering process. This characteristic makes data masking especially suitable for non-production use cases such as development and testing in which end users have no need see original values.

This also makes data masking very different from encryption technologies where reversibility is a purposefully designed into the solution. Encryption relies on the availability of keys that allow authorized users to restore encoded values into readable ones. While encryption methods may be suitable for transmitting data-at-rest or protecting the contents of mobile devices and laptops, they do not necessarily protect organizations from insiders or other actors with access to decryption keys, or from hackers who are able to crack encryption schemes.

Data Delivery

Next-generation data masking solutions can be integrated with data virtualization technologies to allow users to move data to downstream environments in minutes. The ability to leverage data virtualization is critical: Non-production data environments are continually provisioned and refreshed, making the ability to quickly move secure data of paramount importance. In contrast, other data security approaches (and legacy masking solutions) lack delivery capabilities, instead relying on slow batch processes to extract, transform, and load data into a non-production target.

Delphix Data Masking Software

Delphix Data Masking is a solution that gives businesses everything they need to continuously protect sensitive information. Delphix provides a masking solution with a variety of pre-defined algorithms (e.g. secure lookup, shuffling, segmented mapping) along with the ability to define custom masking algorithms or even leverage non-masking techniques such as redaction or tokenization.

Moreover, Delphix integrates its masking tool with data virtualization technology to address the two key challenges that security-minded organizations face: creating masking data, and then efficiently delivering it to end users. With a single software solution, Delphix allows companies to mask and deliver secure datasets in minutes--instead of days or weeks--to comply with regulations and safeguard against data breach.