As companies increasingly contend with the threat of crippling data breaches that could result in lost records, reputational damage, and regulatory penalties, CISOs and CIOs will increasingly turn toward a variety of technologies to help them protect their data. Most sensitive data resides in unsecured and plentiful non-production environments rather than tightly protected production environments, and thus requires a different model of protection compared to what many organizations are used to.
That's where data masking comes into the picture. Masking takes sensitive, personally identifiable information such as social security numbers, credit card numbers, names, addresses, and so on--and replaces that data with realistic but fictitious data. You may hear the terms de-identification, data obfuscation, or data scrambling used in place of data masking--regardless, the process is the same. By masking data before it is sent to downstream environments, sensitive information is removed and the surface area of risk decreases.
There are a variety of data masking tools or data obfuscation tools on the market, and it's worth discussing the evolution of these solutions over time to compare each tool properly. Each generation has improved on the one before, and it is important to consider how added features and functionality impact the security and usability of a company's sensitive data.
First up in the data masking tools comparison is custom scripting. A step above no security measures at all, homegrown scripts selectively mask or--more commonly--redact data before use in downstream environments. Though some protection is better than none, there are flaws associated with the use of custom scripting for data masking. First of all, it's an effort- and time-intensive process. Usually, a trained programmer has to write code, and for simplicity's sake, it will likely be code that redacts data rather than masking it--meaning that the resulting "masked" data isn't production-like. Consequently, testing on that data becomes a more difficult proposition, leading to bugs and production outages. Secondly, every time the database schema changes, the scripts have to change too, again costing time and money. If the scripts aren't updated, the data is left unmasked or testing is conducted on stale data.
Stored procedures are the next step in the evolution of data masking tools. While stored procedures eliminate some of the manual work that custom scripting necessitates, they are usually only useful for one database type at a time. That is, if a company has both Oracle and SQL Server databases, stored procedures for Oracle will make it extremely difficult to mask SQL Server databases the same way. That incompatibility eliminates the possibility of referential integrity across different sources--data across the enterprise will not be masked consistently, which again will cause conflicts throughout the software development life cycle.
Some data masking tools have tried to tackle the problem of masking large volumes of data. These Extract, Transform, Load (ETL) tools may be able to transform or otherwise obfuscate data from point A to point B--but nevertheless require a significant amount of time and infrastructure to do so. Moving and transforming data as it goes from production to non-production environments takes a long time with ETL solutions, making them unwieldy and less likely to be used in comparison with other tools.
The final evolutionary step for data masking requires combining the masking piece with a data delivery piece. Virtual data masking solutions like Delphix not only mask data in a consistent, repeatable manner, they also can deliver that data and make it usable. Such functionality is true whether data is being masked or delivered on-premise, across data centers, or in the cloud, and whether that data originates from file systems or databases.
It's worth noting that dynamic data masking tools also exist. These solutions are a different breed, masking production data in real-time. For most companies, masking and then provisioning data for non-production environments at a particular point in time is sufficient. Moreover, dynamic data masking solutions require a proxy component between the query and the response. This may require additional hardware, impose performance penalties, and introduce new vulnerabilities via the proxy component itself.
Data Masking Tools Comparison Chart