GDPR Requirements for Data Masking
In a world of intellectual property theft, data breaches, and other cybercrimes, businesses are under intense pressure to protect sensitive data. In response to concerns from consumers, governments are creating regulations that require businesses to take appropriate care when handling personal data.
This has created opportunities for businesses to take advantage of technology solutions that help them meet the challenges presented by these new regulatory measures. One of these technologies is data masking—the ability to replace sensitive data with a non-sensitive equivalent while maintaining the quality and consistency needed to ensure that masked data is still valuable to operational analysts or software developers. Although this technology has existed for some time, the General Data Protection Regulation (GDPR) which becomes law in 2018, dramatically elevates its relevance and importance.
The GDPR sets strict limits on businesses that collect, use, and share data from European citizens. Companies—EU-based or otherwise—face new requirements that compel them to rethink their approaches to customer privacy and implement new protections. In fact, a new term ‘pseudonymisation’ has been introduced to add legal definition around protecting personal data. Pseudonymisation is an umbrella term for approaches like data masking that aim to secure confidential information that directly or indirectly reveals an individual’s identity. The GDPR punishes businesses that fail to leverage appropriate protection measures – such as pseudonymisation technologies—as a part of their overall security posture. The fine for non-compliance can be harsh: as much as 4% of global turnover, enough to jeopardize ongoing European operations for any business selling in the EU.
This paper examines the forthcoming changes to the GDPR, identifying the key requirements businesses need to understand, and delineating what must be done to satisfy them. It then goes on to highlight how recent innovations in data masking can ensure regulatory compliance while also eliminating complexity that stands in the way of business agility.
Phil Lee is a Partner in the Privacy, Security and Information Group at Fieldfisher, and runs its US Office in Silicon Valley, California. He holds CIPP(E) and CIPM status, and is a member of the IAPP’s Privacy Faculty.
Phil has particular specialisms in behavioural profiling and cookie regulation, e-marketing, and international data transfer strategies (including binding corporate rules). He has worked on numerous multi-jurisdictional data privacy projects across more than 80 countries. In addition to privacy and information law, Phil regularly advises on a wide variety of technology, social media, and e-commerce projects. Who’s Who Legal has said that Phil “ranks among the finest practitioners” on data privacy and online regulation.
Jes Breslaw is currently EMEA Director of Strategy Delphix. He has held senior european roles for 19 years in technology suppliers and integrators. Jes began his career product managing IBM hardware, and then spent eight years working with security solutions including CheckPoint Software and Cisco. Prior to joining Delphix Jes has worked in companies that provide secure mobile solutions, first Workshare and then Accellion.
Europe’s New Data Privacy Laws
In December 2015, the European Union reached a deal on wide-ranging new rules that will significantly impact all businesses—whether in the EU or beyond—that collect, use, and share personal information about European citizens. The deal reached was the culmination of years and years of hard work by European politicians and legislators, and resulted in the European Commission, Parliament, and Council of the EU agreeing on the text of Europe’s new “General Data Protection Regulation,” the successor legislation to Europe’s aging “Data Protection Directive”.
But why should you care? To explain that, it’s necessary first to take a step back and consider how technology and law have evolved over the past 20 years. The story begins in 1995, when Europe adopted its current Data Protection Directive (Directive 95/46/EC, or the “Directive”)—the law that sets the rules throughout Europe governing how businesses may collect, use, and share individuals’ personal information. The current Directive dates from a time when few households owned computers (by way of anecdote, statistics from the US Census Bureau suggest that only around 30% US households had a computer in 1995), and almost no one had Internet access; a time when there was no social media, no online banking, and no cloud computing.
It’s that same Directive, though, which continues to regulate the always-on, hyper-connected, Big Data world in which Europeans now live.
So technology moved on, but the law had not. Recognizing the need for European data protection laws to keep pace with new technologies, in early 2012 the European Commission decided to publish proposals for a new data protection law—the “General Data Protection Regulation” (“GDPR”). The proposals were controversial, and heavily critiqued by all possible data stakeholders—national governments, global businesses, civil liberties organizations, the press, and others—each arguing from its own perspective that the proposals were either too prescriptive or too lax, too strict or not strict enough. Reaching consensus was not easy. Over nearly the next four years, the GDPR became one of the most heavily debated legislative proposals in the European Union ever, attracting more than 3,000 amendments during its legislative passage.
Yet, despite these difficulties, all parties finally agreed to the text of the GDPR in December 2015, and the GDPR is expected to be adopted into European law in Q2 2016 (with full implementation planned for 2018). Among its controversial new requirements are provisions that the GDPR will apply to any business worldwide that offers goods and services to, or monitors the behaviour of, European citizens, and that businesses in breach of the GDPR can face stiff fines of up to 4% of annual worldwide turnover. With such significant business risks, data protection has grabbed press headlines and board-level attention like never before. Businesses everywhere are assessing their current data protection practices to ready themselves for when the new law takes effect in 2018.
Against this backdrop of changing laws and evolving risks, this paper explores how “pseudonymisation” technologies, such as Delphix’s data masking technology, can help businesses prepare for these changes and mitigate risk under the new law.
Pseudonymisation and the GDPR
What is Pseudonymisation?
European data protection laws protect “personal data”; data which is not “personal” is not subject to European data protection rules and can be used and shared freely by businesses. In the current law, “personal data” has a broad definition, and applies to any “information relating to an identified or identifiable natural person”, including where a person “can be identified, directly or indirectly, in particular by reference to an identification number or to one
or more factors specific to his physical, physiological, mental, economic, cultural or social identity”.
The reference to “direct or indirect” identification has long been a point of consternation for businesses.
“Direct” identification clearly captures information that ‘obviously’ reveals a person’s identity, such as their name and contact details. But what about “indirect” identification? The position of European data protection authorities has been that data seemingly ‘anonymised’ (obfuscated) by removing individuals’ directly identifying details may still be personal data if the resultant dataset enables an individual to be “indirectly” identified.
This may be the case, for example, where a business poorly obfuscates its data and only removes customer names from its databases, but still holds other detail about their account activity (such as which services they use, payment records, and IP address information about the devices from which they access their online account), then the collection of that data may still be sufficient to enable the individual to be “indirectly” identified with relatively minimal effort.
To deal with this, the act makes a distinction. Data that is truly annonymised, (for example aggregated, annonymised statistics where you cannot pull out any individual recognised record) is exempted from data protection. However data that is hidden but has the potential to reveal identities such as the example above
is classified as pseudonimsed.
Under the current law, ‘pseudonymised data’ is not defined but is essentially treated identically to any other form of directly identifying personal data - meaning even where a business has taken steps to scrub its data by using data masking or hashing technologies in the interests of privacy compliance, the scrubbed dataset may still be subject to the full weight of compliance regulation under the Directive.
The current law therefore has the unfortunate consequence that even businesses that try to be ‘good actors’ by correctly implementing data scrubbing techniques, such as masking or hashing, see no regulatory upside from their good behaviour – in turn, disincentivising many businesses from expending the budget and effort necessary to implement these technologies, notwithstanding their clear benefit to data security.
By contrast, the GDPR recognises the need to promote ‘pseudonymisation’ and includes several provisions designed to do just that.
How Does Pseudonymisation Help Businesses to Comply with the GDPR?
Unlike the Directive, the new GDPR contains an express legal definition of ‘pseudonymisation’, describing it as: “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information, as long as such additional information is kept separately and subject to technical and organisational measures to ensure non-attribution to an identified or identifiable person”
Put more simply, the GDPR explains that pseudonymised data is data held in a format that does not directly identify a specific individual without the use of additional information such as separately stored mapping tables.
For example, “User ABC12345” rather than “James Smith” – to identify “James Smith” from “User ABC12345”, there would need to be a mapping table that maps user IDs to user names). Where any such matching information exists, it must be kept separately and subject to controls that prevent if from being combined with the pseudonymised data for identification purposes. Data masking and hashing are examples of pseudonymisation technologies.
Like the Directive, the GDPR still considers pseudonymised data to be personal data, with the consequence that European data protection rules will still govern the use and protection of pseudonymised data. Critically, though – and in very marked contrast to the Directive – the GDPR incentivises companies to pseudonymise their datasets at several different points.
These are described below.
Pseudonymisation as a Security Measure
Article 30 of the GDPR sets out the security requirements that businesses are expected to satisfy. It requires that businesses must implement “appropriate” technical and organisational measures to secure personal data, taking account of the risk presented to individuals if the security of that data were to be breached.
In this regard, the GDPR expressly says that businesses should consider implementing “as appropriate … the pseudonymisation and encryption of personal data.” While the law stops short of telling businesses they must implement pseudonymisation, the express reference to pseudonymisation in the security provisions of the GDPR is highly significant – indicating that, in the event of a security breach, regulators will take into consideration whether or not a business had implemented pseudonymisation technologies. Businesses that have not may therefore find themselves more exposed to regulatory action.
To reinforce this point, the introductory language to the GDPR says that businesses should consider “pseudonymising personal data as soon as possible” in order to satisfy requirements of data protection by design and by default. Put simply, the GDPR sees pseudonymisation as an important tool for achieving compliance with its requirements.
Pseudonymisation to Reduce Data Breach Reporting Burdens
Related to the above point, the GDPR introduces new mandatory data breach reporting rules. Businesses that suffer a data security incident will potentially find themselves compelled to notify their enterprise customers, their regulators and the individuals whose data have been compromised. Current data protection law contains no such requirements, outside of specific regulated sectors (e.g. such as breach reporting rules for telcos and ISPs).
Any business that has experienced a data breach will know that, quite apart from the cost of re-securing the compromised data, data breaches attract very significant financial, reputational and resource costs. In the United States, which has had a long standing data breach reporting regime, the Federal Trade Commission has imposed significant penalties for data security incidents, and businesses that have suffered a breach typically find themselves vilified both in the press and in class action law suits. The concern for many businesses, then, is whether the introduction of data breach reporting rules in the EU may result in the same types of harm suffered by businesses across the Atlantic in the US.
In terms of the specific rules it introduces, the GDPR sets an expectation that businesses must notify data protection authorities within 72 hours upon becoming aware of a breach – a very short timescale for any material data security incident – and must inform the individuals affected without “undue delay.” However, the GDPR says that businesses do not need to notify data protection authorities if the can “demonstrate … that the personal data breach is unlikely to result in a risk for the rights and freedoms of individuals”.
On a similar note, it also says that businesses only need to inform affected individuals if the breach is likely to result in a “high risk” to their privacy – and that notification is not required if the business “has implemented appropriate technical and organisational protection measures … that render the data unintelligible to any person who is not authorised to access it”.
In short, if a data breach presents low risk to the individuals concerned, the GDPR’s breach notification requirements become more relaxed. Pseudonymisation, whether through masking, hashing or encryption, offers a clear means to reduce the risks to individuals arising from a data breach (e.g. by reducing the likelihood of identity fraud and other forms of data misuse), and is supported by the GDPR as a security measure as already described above.
In consequence of this, businesses that have effectively pseudonymised their data may therefore benefit from exemptions from notifying regulatory authorities and the individuals affected in the event they suffer a data breach. Given the ever-increasing occurrence, and cost, of data breaches, this is a highly significant incentive for businesses to pseudonymise their datasets.
Pseudonymisation to Reduce Data Disclosure Burdens
One of the greatest compliance challenges under the current Directive concerns the “right of access”, which allows individuals to ask a business to provide them with a copy of any personal information processed about them. The business has to comply with this request within a very short timescale (typically just 40 days), and in that time has to undertake extensive – and costly – efforts to search its systems to identify any personal information relating to that e-mail, remove any third party personal information from materials identified for disclosure (for example, references to third parties in e-mails), consult with legal counsel to review the material to be disclosed for compliance and risk management purposes, and then deliver up the information to the individual.
Data access requests are very commonly made in the context of litigious claims, by individuals seeking to get wider access to information than they would ordinarily be entitled to under normal litigation disclosure rules.
Individuals will continue to have a right of access to data under the GDPR. However, consistent with its approach to pseudonymisation on data breach issues, the GDPR appears to relax disclosure requirements in response to a data access request where data has been pseudonymised. It says that where the business can “demonstrate that it is not in a position to identify the data subject … Articles 15 to 18 [i.e. the right to access] do not apply except where the data subject, for the purpose of exercising his or her rights under these articles, provides additional information enabling his or her identification.”
This means that a business may not be obligated to include data that has effectively been pseudonymised when responding to data access requests from an individual. This is a particularly important benefit for large consumer-facing businesses who may face lots of subject access requests from their customers at any given point in time.
Pseudonymisation to help Profiling Activities
A further key development in the GDPR is that the new law introduces a specific concept of “profiling”, defining it as
“any form of automated processing of personal data consisting of using those data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements”. The GDPR goes on to say that businesses should not make “decisions” about an individual if those decisions are solely based on automated processing, including profiling, unless one of certain specific legal criteria
are met – typically requiring the individual’s “explicit consent”.
The rule only applies, however, if the profiling produces “legal effects” concerning the individual or “similarly significantly affects him or her”. The GDPR specifically mentions refusal of online credit applications and e-recruitment of two such examples of automated decision-making. One big question, though, is whether online profiling for the purposes of data analytics or targeted advertising are caught by this rule?
While the GDPR does not provide absolute clarity on this point, data profiling where an individual’s direct identifying information has been removed through pseudonymisation will significantly reduce any privacy impact on the individual, particularly when keeping in mind the GDPR’s overarching support of pseudonym-isation. In view of this, online data analytics or targeted advertising practices based on pseudonymised data seem very unlikely to produce “legal effects” or “significantly affect” individuals – and, as a result, are unlikely to be subject to the explicit consent requirements for automated decision-making mandated by the GDPR.
The Risks of Non-Compliance with the GDPR
When it comes into effect, the GDPR will introduce severe penalties for businesses that are non-compliant. The GDPR creates a two-tier fining regime, indicating that certain breaches of the GDPR can attract fines of up to the greater of EUR10,000,000 or 2% of annual worldwide turnover (i.e. top line revenue) while other, more serious, breaches can attract total fines of up to the greater of EUR20,000,000 or 4% of annual worldwide turnover.
In assessing what fines to impose, data protection authorities may take account of “the technical and organisational measures” implemented by businesses – and use of pseudonymisation technologies will undoubtedly be an important consideration here.
Aside from the risk of fines, the GDPR also grants data protection authorities additional powers, including mandatory audit rights, and gives individuals the ability to bring legal claims (or have legal claims brought on their behalf by civil liberties organisations or similar) against non-compliant businesses.
The GDPR can therefore be thought of as introducing both a ‘carrot’ and a ‘stick’ approach to encouraging businesses to pseudonymise their data – a ‘carrot’ by virtue of expressly recommending pseudonymisation at specific points in the GDPR and reducing certain obligations on businesses that pseudonymise their data; and a ‘stick’ by threatening significant penalties for businesses that are non-compliant.
Pseudonymisation and Data Masking Technology
Data masking represents the de facto standard for achieving pseudonymisation, especially in so-called non-production data environments used for software development, testing, training,
and analytics. By replacing sensitive data with fictitious yet realistic data, masking solutions neutralize data risk while preserving the value of the data for non-production use.
Alternative approaches such as encryption fail across key dimensions. Chief among these is its vulnerability to identity breach, insider threats, or other scenarios in which actors obtain decryption keys: anyone with the right decryption keys can walk past encryption defences and gain access to sensitive data. In contrast, data masking irreversibly transforms sensitive data to eliminate risk from insider and outsider threats alike.
Pseudonymisation Requires a Data First Approach
While data masking provides organizations with a tool that fits key challenges emerging from the GDPR, businesses must apply it with a “data first” approach that involves greater awareness of how data changes and moves over time, and how to better control it. Specifically, businesses will be most effective in achieving pseudonymisation through masking if they address the following questions:
Where is Your Data?
Enterprises create many copies of their production environment for software development, testing, backup, and reporting. These environments can account for up to 90% of all data stored and are often spread out across multiple repositories and sites. Businesses that understand where their data resides—including sensitive data located in sprawling non-production environments—will be better equipped to allocate protective measures.
How do you Govern Your Data?
Very few organisations have a Chief Data Officer or Head of Data Protection. Even those that do may not have adequate control over how data is moved and manipulated because individual business units—each with their own administrators, IT architects, and developers—often define data-related processes at the project level, with little or no corporate policy enforced or even available. Businesses addressing the GDPR must take steps to regain data governance and introduce tools that drive greater visibility and standardization into processes such as data masking.
How do you Deliver Data?
Many existing approaches to delivering data are highly manual and resource-intensive, involving slow coordination across multiple teams. Adding pseudonymisation to already cumbersome data delivery processes only adds to this burden and enterprises often end up abandoning efforts to make technologies like data masking work. To effectively implement a technology like data masking, businesses need to not only streamline data delivery, but also ensure that masking is a repeatable and integrated part of the delivery process.
The GDPR: A Force for Positive Change
For many organizations, the GDPR clearly creates an imperative to evaluate and update how they store, manage, and secure data. And critically, the new regulation will also usher in a wave of IT innovation with the potential to not only ensure compliance and reduce the risk of data breach, but also to accelerate critical business initiatives.
Data Masking Using Virtual Data
For example, innovations that combine virtual data and data masking simplify the process of not only masking data, but also delivering masked data. Such a platform-based approach allows businesses to create and deliver lightweight virtual data copies in a fraction of the time and storage space consumed by regular physical copies. Virtual copies are stored, managed, and delivered from a single point of control to maximize data governance.
Moreover, data masking can be designed into the data delivery process such that virtual copies are automatically masked. The overall effect is that masked data is created and delivered much faster, facilitating GDPR compliance and accelerating processes that depend on secure data. Chief among these processes are software development, testing, and analytics projects that—now more than ever—determine how businesses compete and succeed, no matter the industry.
Data masking technologies have been around a long time. So why do so many companies fail or simply choose not to use them? The reason is that traditionally they’ve been highly manual, complex pieces of work. Dedicated individuals or teams carry them out and each application must be worked on independently, forcing organisations to prioritise which datasets to mask and which to leave unprotected. In the most recent Bloor Data Masking report, it gives the example of Oracle’s data masking, which “requires the use of the Oracle database (as well as a lot of IT skills)”. The problem with data masking isn’t the masking rules, but the delivery of the masked data. In fact the Bloor report goes on to discuss how some of the standalone methods have become commoditised,
“..many of the solutions on offer will be selected as much for the complementary capabilities that are offered as for the product’s pure masking capabilities”.
Gartner’s December 2015 report, Magic Quadrant for Data Masking Technology, Worldwide provides an example of how data masking paired with the data virtualization capability of Delphix brings added benefit:
“Combining [data masking] with data virtualization saves time and storage; data is masked only once in the virtualized (shared) data and in any changed data, while retaining storage space savings. Data virtualization technology can also save time by keeping copies of the masked data and serving them by request.”
So what is Delphix and how does it transform a process that’s traditionally slow, siloed, painful, and expensive into something automated, centralised, fast, and efficient? Delphix collects production data and then remains in sync with production forever, creating a near-live copy of the production data. Using this copy, Delphix creates complete and current ‘virtual’ copies of the data via self-service in minutes. You retain full control over all your production data. And because you’re working with only a single real copy as opposed to hundreds, you’ve dramatically reduced the surface layer of attack. You also now have full knowledge of where any virtual copies reside and who can access them, giving that much needed control and governance.
At the same time, a data masking policy can be set up beforehand, so whenever virtual copies are requested, the data is masked instantly. Data masking simply becomes part of the automated data delivery process. This allows data protection to be embedded within the entire life cycle of the technology, from the very early design stage, right through to its ultimate deployment, use and final disposal.
Repeatable and Secure Data Delivery
• Sync with production data source
• Provision a complete, virtual copy of production
• Automatically discover and mask sensitive data
• Distribute masked copies in minutes
• Provide data consumers with self-service access and control
The EU GDPR strongly incentivises the pseudonymisation of all personal data. To address this, businesses need greater visibility and control over their data, coupled with tools that not only mask data, but also streamline and automate that process. Such an approach can help businesses:
– Take steps to protect personal data, in accordance with GDPR requirements.
– Avoid the need to report data breach incidents.
– Provide tools that enable their legal teams to identify, audit, and report on data.
– Reduce or eliminate the requirement to obtain consent for data profiling.
– Accelerate IT and business processes that depend on access to secure data.