Whether it’s during Covid-19 or not, security threats loom at every corner. Here are top data security tips for appdev teams to mitigate the risk of data loss and avoid being shut down by digital attacks while working remotely.
Apr 21, 2020
In the wake of the Covid-19 pandemic, businesses have found themselves in the middle of the world’s largest work-from-home experiment—essentially running a virtual company. In fact, Gartner predicts more than 70 percent of companies plan to permanently shift to more remote work post-coronavirus.
This new reality is surfacing the limitations of our digital infrastructure to handle this massive workplace transformation. The long-term feasibility of remote work depends on workers being able to do everything from their home desk, on their own device, at a space on par with their work environment—all without compromising their data.
As a result, it’s putting security measures to new and extreme tests, especially as more and more hackers take advantage to scale up attacks. In many ways, digital attacks are almost as dangerous and catastrophic as physical ones as digital technology underpins core societal functions, such as health care, transportation, financial services, and more.
“The amount of risk is at an all-time high,” Chris Hertz, CRO at DivvyCloud, said in a recent interview with Digital Trends. “If I were a cybersecurity professional, I would not be sleeping right now. It’s a staggering problem.”
Throughout my career, I’ve helped hundreds of customers modernize their IT and lay the groundwork for sound data security practices. Here are three data security tips for appdev teams to confidently build and deliver new products and services to customers while working remotely.
In many shops, test data is still copied production data. It's a common practice that creates risk that can have devastating consequences, as Uber found out the hard way. Since lower environments are typically less protected, it’s easy to see how data leakage can occur and how a company can suffer the wrath of regulations like GDPR. This problem gets amplified when people change their work location, start using their own device, or need to use their own network.
If we apply the concept of Zero Trust to the data in our Dev and Test environments, we can see that we need to protect the data itself, as well as protect the end-points where we distribute data. How can we do that? We need protected data, regardless of where we access it. We need to stop data leakage as soon as we detect it, and we can’t allow our response speed on data threats to be compromised when we change location, device, or network.
If properly done, data masking transforms data in a way where the original data is unrecoverable from the masked data, and preserves enough business value to make it usable for testing and data analytics. If we distribute only masked data with preserved business value to less secure environments, our data becomes effectively “black,” meaning malicious actors simply don’t know that the data is masked because it looks real; the protection is opaque to them.
Then, if we’re able to virtualize masked copies of data, which creates thin, agile clones of datasets, we can revoke datasets for one or many endpoints almost instantaneously. That means regardless of how fragmented your infrastructure is and regardless of the size of the datasets in your data fabric, you have a point of control that can plug the leak right now by revoking the datasets in their entirety.
Having the ability to rapidly update everyone’s dataset means that an attack utilizing bad data can rapidly be repaired at all entry points—without compromising the code/feature delivery chain in the IT shop.
Datasets are large files. Like code, they often need patching. For instance, forgetting to update data masking rules in test datasets with the new and much broader data protections on things like non-public education data or biometric data that the CCPA regulation covers can be just as damaging as forgetting it in prod. With more people at home, this threat matrix grows significantly as they need their own copies of datasets, and more people are clogging up the network pipe. I’ve already heard of one company banning video on Zoom because it is stuffing their VPN.
While you can sometimes “patch” those datasets quickly with a simple tweak, other times you can’t. As the number of distributions increases, the cost of repair increases. Since most datasets like databases and apps tend to be large files, their size and multiplicity, combined with overused network pipes creates a pretty large gap that a malicious actor might be able to squeeze through.
When there is a known data leakage, we want to revoke the bad data as soon as possible, but also get everyone back to work as soon as possible. As before, it’s crucial not only that we can revoke datasets quickly, but that we can re-distribute updated datasets quickly as well. That means that once the data error has been patched, it’s dead simple to provision updated datasets back out to all of the end users in just a few minutes.
Suppose a data bandit gets in, surgically steals data, and updates the logs to cover their tracks. Even if we had features to rapidly revoke and repair datasets, how can we figure out what happened and how can we prove it? Having a continuous stream of the changes of a dataset recorded in a way that is separate from the underlying dataset itself means that you have a reliable way to reproduce the exact state of a dataset—even a large dataset—in short order.
If we can reproduce the exact state of a dataset just prior to a compromise, we can preserve as much good data as possible. If the record of changes to the underlying dataset is protected from the malicious actor, it contains proof positive of what happened. We can trace their actions, and they will be unable to say they didn’t do it (meaning they can’t repudiate it!)
Finally, many times, the art of investigation leads us down avenues where we might need to recreate many different points in time to figure out where the problem actually occurred. Without thin cloning, this is often just impossible. Imagine trying to spin up 10 different versions of a massive SAP landscape without thin-cloning. That could be a 12-month operation.
This sudden shift to remote work can expose or exacerbate vulnerabilities in data security. We need to shore up weak links with a variety of tactics to avoid data breaches and a slowdown in our software feature delivery pipeline as well as delay responses that might allow sophisticated data attacks that might be unprovable and uncorrectable.
In a future dominated by remote work, the spotlight on data in the hands of so many users at so many end-points will have much higher scrutiny. We will need to work hard to decouple the speed and security of our software feature delivery pipeline from the speed and security of our work environment.
Data will be ever more like code. Thus, it will be subject to the same kinds of attacks and the same need for patching and rapid response. Now more than ever, we will need the tools to investigate, re-create, and remediate those attacks in a posture that recognizes threats, like a zero-day attack, that are increasingly likely for data just as much as they are for code.