Platform

How to Find the Right DataOps Platform

Tools alone won’t make you a DataOps aficionado, but here are 3 questions to ask when you’re looking to adopt the right technology for your data teams to support you on the DataOps journey.

DevOps, which started as a grassroots movement, has become the status quo for organizations looking to be more agile and deliver high-quality products and services faster, ultimately enabling businesses to accelerate past their competition. While DevOps has played a major role in automating infrastructure and the software development lifecycle, the critical element missing from the innovation pyramid is data delivery, and that’s where DataOps comes in.

Many enterprise data teams still struggle with provisioning a new environment in less than a day. Forty-seven percent of global enterprises say it takes four to five days to provision a new data environment, according to research by leading advisory firm 451 Research. When your infrastructure is fast and your SDLC tooling is fast, the data lag will drag you down.

As companies become more and more proficient at automating their infrastructure, teams can spin up and tear down compute, storage and network environments in minutes instead of weeks or months, using providers and tools such as Ansible, Chef, and Puppet. Organizations have heavily invested in agile and DevOps and built test automation to automate their development pipelines and push releases out faster with tools like Git, Jenkins, Maven, and Docker.

Similarly, DataOps has now emerged as the final foundational piece for disruption. DataOps is a collaborative data management practice that helps improve efficiency and ways in which data is used across the enterprise through the alignment of people, process, and technology. Looking specifically at the technology aspect, here are 3 questions every IT leader must ask to find the right DataOps platform:

1. Can the platform deliver data from any data source to any stakeholder in an automated manner?

As applications leverage multiple data sources, the SDLC workflow should also mirror the same scenario in AppDev, QA, staging and production environments. Your solution must be able to deliver data from all production data sources in a consistent manner to individual key stakeholders. In addition to this, the platform should provide a consistent set of abstractions that work the same way regardless of the data source or operating context (on-prem versus public cloud), enabling repeatable, integrated, and unified workflows from multiple data sources to every user.

2. Can the platform automate data discovery and masking of sensitive information across the enterprise?

An impactful DataOps solution should be able to profile, highlight, and then mask data from any data source to ensure that sensitive data is not exposed in lower tier environments and retain the business value from the data. The data generated should be realistic, but fictitious, so testing is feasible but provides zero value to thieves and hackers. The resulting masked values should be usable for non-production use cases. You can’t simply mask names into a random string of characters.

3. Can the platform provide personal data environments to end users and then let them manipulate those environments?

The platform should rapidly provide personal data environments (without storage overhead) with advanced data manipulation controls such as bookmark, rewind, reset, and branch. Individual users, including QA, testers, and developers should be able to collaborate efficiently and easily by sharing a bookmark and building a library of bookmarks for multiple workflows.

You’re only truly able to disrupt and innovate faster than competitors when all three layers of infrastructure, the software development lifecycle, and data are automated. Only then can an enterprise be equipped to deliver applications in a truly continuous manner and go to market faster with its product offerings.