Platform

Data Access for All

What really matters is access to data, and anything that stands in the way is the enemy.

Eric Schrock

Aug 22, 2017

Phil Wainewright wrote of the Delphix Data Platform:

Delphix provides a ‘data anti-gravity platform’. By creating a lighter, virtualized copy of the data that stays in sync, Delphix allows data to sidestep the physical constraints that weighs it down.

Data gravity is a powerful concept, and it’s true that the physicality of data and its associated costs — storage, bandwidth, performance — limit the velocity of innovation in the enterprise. But there’s a reason that DataOps has a broader definition:

DataOps is the alignment of people, process, and technology to enable the rapid, automated, and secure management of data. Its goal is to improve outcomes by bringing together those that need data with those that provide it, eliminating friction throughout the data lifecycle.

At Delphix we’ve been living and breathing DataOps for years, and just like Phil we first focused on what was near and dear to our product — the “weight” of data. But as we talked to more customers, analysts, and industry leaders, what really mattered to them was access to data, and anything that stood in the way of that access was the enemy. One CIO, for example, described data in her enterprise as a river — and the need to open tributaries to feed into that flow and enable access for everyone.

So if data gravity is only one source of friction, what else inhibits our ability to access to data? This infographic puts it in context, but here we want to dig into the “supply side” of data friction.

data friction

1. Cost and Complexity

The first is the familiar home of data gravity. As data grows in size, so do the resources required to store, serve, and move it. While storage costs have shrunk and networking speeds have grown, they have not kept pace with data growth. Not just the size of production, but the total cost of data — including the ever-expanding demand for non-production copies.

Complexity is cost’s evil sibling. There are more clouds, more data sources, and more data tools than ever before. Gone are the days of building all your applications on top of your on-premise Oracle databases. Now you might have IoT sensor data in a hosted Hadoop cluster, application data in Cassandra spread across multiple AWS regions, and your customer information in an on-premise PostgreSQL database. Responding to the same requests from consumers across all these data repositories requires specialized skills, tools, and integration that dramatically increases the cost of serving data consumers.

2. Security and Governance

More than 9 billion personal records have been lost or stolen in the last five years, with the average data breach costing $3.62 million in 2017. Locking down access inhibits innovation, and using synthetic data isn’t much better.

Most companies turn to data masking to de-identify sensitive data, but quickly find that transforming data is not the hard part, rather it’s continuously delivering secure data to everyone that needs it. This is not a data gravity problem, but a process and technology problem. Companies must be able to proactively identify insecure data as it evolves within their enterprise, define masking rules to continuously de-identify data, and then make that secure data available across environments and users.

3. People and Process

The fragmentation of data, spurned by the rise of data complexity, leads to data silos:

A data silo is a repository of fixed data that remains under the control of one department and is isolated from the rest of the organization, much like grain in a farm silo is closed off from outside elements. Data silos can have technical or cultural roots.

Multiple silos means multiple teams, skill sets, and processes required to manage data. And cross-organizational capabilities like data governance and integration become orders of magnitude more difficult.

Overcoming these silos requires teams requires understanding why they formed in the first place. Data operators and consumers tend to view the other as “the enemy”, but correlation is not causation. The common enemy is in fact data friction caused by the changing dynamics of software development. But when organizations can’t solve data friction, teams turn to data silos and shadow IT as a way to meet their needs while avoiding the most difficult demands of the business.


DataOps as a Solution

Data friction is a pernicious enemy, but the battle for data access is not on a one-time event. To borrow explicitly from Jez Humble of DevOps fame:

DataOps is not a goal, but a never-ending process of continual improvement.

Overcoming these sources of friction requires constant iteration across several key dimensions:

  • Reducing the total cost of data by making it fast and efficient to deliver data, regardless of source or consumer. Automation and tooling is critical.

  • Integrating security and governance into a seamless data delivery process. This requires integrated masking, but also a governance platform and process to ensure the right rules and access controls are in place.

  • Breaking down silos between people and organizations. This starts with the organizational change to bring people together into one team, but requires technology change to provide self-service data access and control.

As you look to open up access to data within your organization, remember to identify and attack all sources of friction. The DataOps movement is growing, and the ecosystem of tools is emerging — seek out DataOps solutions that work for you. We’ll be here to help.