Regulations Won't Kill AI -- Bad Data Will
This article originally appeared on Forbes.com as part of Delphix CTO Eric Schrock’s ongoing column. See the original post here.
Will regulations hurt AI?
In light of emerging data laws, this has become the prevailing question for much of the tech industry. While some regulations will certainly impact artificial intelligence's (AI) development, new legislation will only change our approach to AI — not destroy it.
Many believe regulations will force businesses to lock down access to data, starving AI systems of the data they need for training and execution. Faced with an empty data pipeline, companies will be forced to give up or rely on weak, vulnerable data to circumvent new data legislation. But AI innovation can thrive within the boundaries of today’s regulatory landscape. To do so, companies need to understand the nature of their data, use it in the right way without cutting off access and improve its flow within their organization.
Understand Your Data To Save AI
Companies have traditionally built teams, processes and infrastructure around applications — applications that, like AI, are becoming increasingly data-dependent. As data crosses boundaries between users, developers and third-party sources to power new technologies, this back-and-forth creates risk and leaves companies susceptible to security breaches. As a result, the way sensitive user information is shared and managed has become the biggest pain point for enterprises as they work to meet new regulations.
It may be tempting to stop the flow of data or severely restrict access to it, but this approach can be lethal. When you kill access to data, you kill the future of AI innovation. Instead, take the time to understand your data — its properties, constraints, security needs — so you’ll be able to feed data-dependent technologies safely and efficiently without skirting data laws.
Here’s something to consider: If someone on your team asks for a particular dataset for an AI-focused project, you shouldn’t have to pull in the compliance team to determine whether there is sensitive EU citizen information in the sample that would run afoul of the General Data Protection Regulation (GDPR). This sort of clunky bureaucratic process could add days or weeks to satisfy the request. Instead, you should already be able to pull the attributes of your data, know the role of the user and the nature of the request to quickly provide access with confidence and low overhead. It’s not about no access, it’s about the right access to the right people.
Challenges Don’t Necessitate Defeat
GDPR has already claimed its first casualties, such as a promising blockchain service that shuttered due to GDPR’s strict parameters on the storage of sensitive user information. Undoubtedly, similar stories will continue to pop up as companies start to grapple with issues of compliance. With experts projecting that AI will be one of the technologies hardest hit by data restrictions, the road ahead will be bumpy for machine learning — but not impassable.
When we peel back the layers to compliance, it all leads back to one key thing: the necessity to mitigate risk by masking user data before it’s ever fed into AI systems or accessed by developers and engineers. While most are familiar with the process of obfuscating names, addresses and credit card numbers, the problem is not so black and white. What is private to one person may not be to another. Statistical analysis can yield private information about otherwise masked data, and data is becoming increasingly varied and complex -- such as voice, video and genetic sequencing. Mitigating risk requires first understanding that risk, and then assessing your risk tolerance as data flows within the enterprise and beyond. Only then will you be able to maintain the integrity of your data, sustain the technologies that depend on it and avoid becoming another data-related tragedy. To thrive under security-centric policies, it’s a matter of capitalizing on the solutions that will protect data as soon as it enters your system.
Regulations Will Beget Creativity
Here’s the silver lining: From GDPR to California’s tough new data privacy law, regulations will have a positive impact on our collective skills since designers, developers and engineers are becoming more empowered to undertake forward-thinking techniques to remain compliant without sacrificing innovation. Restrictions on data usage and testing will ultimately shape the way companies strategize their core business objectives in a more secure, privacy-centric way.
We’ve already seen the adoption of techniques like differential privacy used by Apple, a Delphix customer, to obscure user activity data before it ever reaches their servers. Google is rolling out enhancements to its data loss prevention API that includes not just novel masking such as image redaction but also techniques like k-anonymity to mitigate the risk of information leakage through statistical analysis. Even AI approaches are getting in the game, such as Generative Adversarial Privacy that use machine learning models to develop ideal implementations of differential privacy.
Most companies are struggling to implement even the most rudimentary data masking, but these approaches are showing just what’s in store as companies come to grips with the new reality. This will undoubtedly make it more complex to deliver data to applications that depend on it, but companies shouldn’t compromise innovation to meet regulatory requirements.
Data laws and AI can coexist with a privacy-first approach to development and a deeper understanding of the way data flows within your enterprise. In fact, increased governance will likely have a positive impact on the future of AI, forcing us to become more creative and adopt a new perspective balancing data access with privacy.
Download "Accelerating Business with Data Management in Machine Learning & AI" to learn the 5 characteristics of an optimal strategy for operationalizing AI and ML data requirements in a fast, secure and automated fashion at scale.