Application Development

Applying DevOps Test Data Management to Artificial Intelligence and Machine Learning

DevOps test data management quickly prepares data to be used by AI/ML models in the most efficient way possible

Jason Axelrod

Apr 26, 2024

It’s tough to think of a recent technology that has so thoroughly captured the attention of the public and the business world alike as artificial intelligence (AI) and machine learning (ML), collectively known as AI/ML. Models like ChatGPT and DALL-E 2 continue to make headlines, with the public exploring AI/ML applications from conducting symphonies to devising workout plans. At the same time, many enterprises are already leveraging AI/ML — an IBM study found that in 2022 35% of companies were using AI, while 42% of companies are exploring the potential use of AI. The study also found that large companies were 100% more likely to use AI than small companies. 

The increasing adoption of enterprise AI/ML presents an exciting use case for DevOps test data management (DevOps TDM), a critical set of software development processes that present the most effective way to create, manage, and deliver test data to DevOps teams. As a successor to traditional test data management (legacy TDM) practices that have been employed for decades, DevOps TDM accommodates key, recent innovations and issues in ways that legacy TDM cannot. By adopting DevOps TDM, enterprises can accelerate AI/ML-enabled applications’ time-to-market while also ensuring that the data used to train their models is secure and compliant with global data privacy regulations.

Legacy TDM Versus DevOps TDM

Test data management (legacy TDM) is a set of processes that creates, manages, and delivers test data to development teams. While legacy TDM has been a longstanding staple of efficient software development, it’s traditionally been viewed within software departments as a back-office function: important, yet not worth tampering with. 

Compared to legacy TDM, software development methodologies such as Agile or DevOps are far more visible. These methodologies have vastly accelerated software development over the past few decades, allowing businesses to move far quicker than they did when TDM first originated. At the same time, the data involved in TDM has faced new, modern challenges, such as increasingly sophisticated cyberattacks, a growing set of global data privacy regulations, and soaring volumes.

Despite the increased acceleration of software development and the growing nuances of data management, businesses still have multiple IT teams using the same manual, high-touch processes to handle TDM as they did in decades prior. This significantly hinders a business’s ability to maximize its time-to-market. TDM must evolve to keep up with software development and modern data challenges — which is where DevOps TDM comes in.

DevOps TDM accommodates these recent innovations and issues in ways that legacy TDM cannot. DevOps TDM primarily utilizes three technologies to succeed: application programming interfaces (APIs), data masking, and data virtualization. APIs allow DevOps TDM tools to be triggered, monitored, modified, and stopped from other tools such as internal developer platforms, IT service management applications, and DevOps pipelines. Data masking replaces sensitive data values such as personally identifiable information (PII) with fictitious but realistic equivalents. Data virtualization creates extremely lightweight instances of data for expedited distribution that retain the characteristics of the original versions of data. 

When these three technologies work in unison on an automated basis, they produce lightweight, compliant and secure real-time data that developers can quickly obtain as needed to feed into any tools required. This makes it an invaluable asset for fast-paced application development and training AI/ML models on an accelerated basis. 

AI/ML’s Voracious Data Needs Overwhelm Legacy TDM

One of the most exciting characteristics of AI/ML is the numerous applications that can benefit from it. Businesses have realized this benefit and are diversifying their uses of AI, embedding an average of nearly four AI capabilities within at least one business unit, according to a McKinsey study. Current common AI use cases include process automation, leveraging customer insights to advise decision-making, bolstering customer service, and improving supply chain operations, per The Wharton School of the University of Pennsylvania.

All of these functions require an immense amount of quality data. Every AI/ML model is “trained” using data the business must supply and feed into their models. Unlike other software, AI algorithms also require unique datasets for both training and testing purposes, requiring developer teams to split existing databases into distinctive datasets. 

The demand for AI training data and AI testing data has put pressure on IT teams to acquire and use high-quality data for testing. Because this data is still exposed to the same compliance and security risks that other use cases’ data faces, data agility remains a key feature of development teams’ TDM approaches to AI. 

When legacy TDM processes were first implemented, AI/ML was in its early stages. Today, TDM processes such as manual data handling and the coordination of multiple teams are rapidly growing outdated and inefficient, especially given the speed with which businesses would prefer to train and test their AI/ML models. As a result, many AI/ML models are often starved for fresh data, forcing developers to use stale data to train and test them.

How DevOps TDM Helps AI/ML

Unlike legacy TDM, DevOps TDM quickly prepares data to be used by AI/ML models in the most efficient way possible. The three key components of DevOps TDM work in unison to achieve this goal. Virtualizing data allows consistent, quick access to that data (and its various versions) in near-real time. Data masking allows AI/ML test data to be rendered secure and compliant. Automated APIs enable the consistent, efficient delivery of that data to various locations. And with all of these improvements, the data that AI/ML models use remains production-grade.

The implementation of DevOps TDM in AI/ML model training reduces wait states immensely. But it also leads to significant cost savings as well. Data masking avoids cyber risk-related penalties, such as compliance fines and the costs of ameliorating a data breach. Using data virtualization lowers the infrastructure costs associated with housing a data lake, while also saving on energy costs. Efficient, automated processes reduce expenses associated with maintenance and development. And the time savings make developers more productive, which reduces employee turnover and attrition-related expenses. 

Accelerate Your Enterprise’s AI/ML Adoption by Implementing DevOps TDM

AI/ML is poised to achieve staggering growth before the end of the decade. According to Fortune Business Insights, the AI and ML markets were respectively valued at $387.45 billion and $21.17 billion in 2022. By 2029, the AI and ML markets are expected to grow to $1,394.30 billion and $209.91 billion respectively. By then, enterprise AI/ML use cases will have expanded, with many more types of businesses adopting the technologies. 

By 2030, every company will be an AI company. Adopting DevOps TDM is essential to maintaining competitive advantage and differentiation, and it will only grow more important as companies rely further and further on test data to improve their technology portfolios.

The Delphix Data Platform provides a comprehensive DevOps TDM solution that allows organizations to obtain fast, compliant data for testing application releases, modernization, cloud adoption, and AI/ML programs. Contact our team today to learn more about how Delphix can support your AI/ML initiatives.