Building a Better SAP Masking Solution: Part One
Over the last few years, we have a seen a sharp increase in the number of organizations interested in accelerating the development cycle of their packaged applications (like SAP ERP). This is being driven by the immense pressure organizations are facing to deliver functionality to market faster while at the same time making sure they are complying with regulations like GDPR.
Most of our customers use a combination of the virtualization, self-service and masking functionality of our platform to deliver test data to whomever needs it in a secure, safe and easy manner. A common struggle that we started to hear from the entire SAP market was that organizations were struggling with securing their sensitive data before sharing it with their downstream environments. There was simply no great solution out there that did what they really needed it to do. As we dug into this, we sought to get clarity on three particular questions (the first two which we will explore in this post and the last which we will explore in Part Two of this series):
- The Issues: Why is it a struggle to mask SAP data?
- Ideal Solution: What would the ideal SAP masking solution look like?
- Building It: How can we improve our platform to make fill the masking needs of our SAP customers?
As we began our research, three big struggles and attributes of an ideal solution became apparent.
“Where is our Sensitive Data?”
It immediately became apparent that companies with SAP applications were struggling to determine where the sensitive data resided on their SAP systems. The reasons why? Well, there are 2 particular reasons. The first has to do with pure quantity. A typical SAP application comes out of the box with 100,000 plus tables. With many of these tables having 10 or more fields, that is over 1 million types of data that you must decide are sensitive or not. To make it even more complicated, the standard naming convention for SAP tables and fields can be very hard to understand.
For example, in many databases if you had a column that held names of cities, you may name it “City_Names.” In SAP, a common column name for data that may contain the names of cities is “ORT01.”
This unique naming structure makes it very hard for automatic sensitive data identification software (like the Delphix’s data profiling service) to automatically scan through the 100,000 tables to discover which fields are sensitive. The way these frequently work are to do things like search for column names with “city”, “city name,” “city_name” etc. It also makes it especially hard for individuals to manually go through the tables and identify which data is sensitive and in need for masking. With this discovery, the first attribute of an ideal SAP masking solution became clear. The solution must allow customers to identify where/what their sensitive SAP data is in a very easy and fast way.
“Ugh…... I Broke It”
Now let's say, that somehow you were able to identify the hundreds or maybe thousands of sensitive fields in your SAP system. There is yet another problem we started to hear. When companies tried to actually mask some of the SAP data they kept breaking the entire application or could not get certain transactions to succeed. That's not want you want after spending weeks and in some instances months identifying what data you needed to mask.
So what went wrong? Well it turns out that SAP applications have a high number of checks that happen at the application layer and not at the database layer. For example, a table tracking user login sessions may be relying/checking another table that lists valid user login names. If you mask one table but not the other or mask them in a different manner - suddenly all types of errors and failed transactions occur. With this discovery, the second attribute of an ideal SAP masking solution became clear. The solution must recommend masking methods that, when applied, do not break the transactions or the entire application.
“Why is it so Slow?”
Okay so you have gotten this far. After a ridiculous amount of work and effort, you know what all the sensitive data is and you know which masking algorithms to apply in a manner that will not break the application. You click run and wait. 1 hour passes. Then 3. Then a day. Then 3 days. FINALLY, the masking is complete and you now have masked version of you SAP data. What we kept hearing, was that this is just to slow. With companies under immense pressure to move and iterate faster, days or weeks is way too long to wait for a masked copy of your SAP data.
With this discovery, the third and final attribute of an ideal SAP masking solution became very clear. The solution must be able to mask many TBs of data fast!
Next time, we will explore how and what we built to help SAP users with these struggles.