Data Problems Block AI/ML Initiatives. Here’s How to Fix Them

Nearly 90% of AI/ML projects never make it to production. Learn about the top data challenges data scientists are facing in 2021.

Sharon Bell

Mar 02, 2021

https://a.storyblok.com/f/137721/3840x2160/a22a095c51/istock-1194791390.jpg

In 2020, CIOs and IT teams rushed to build digital capabilities to meet the surge of online demand caused by the pandemic. In 2021, established industry leaders continue to feel the urgency to transition to digital as consumers demand improved online experiences.

Artificial intelligence (AI) and machine learning (ML) can massively speed up the time to understanding and adapting to customer needs, but only if the data is available for building and testing models. Accelerating the pace of digital innovation and AI/ML requires lots of data from core business systems and customer facing applications. While most established companies have volumes of data, as a recent survey shows—slow data delivery often negates speed.

A new survey by Pulse suggests that data is a roadblock to AI/ML projects if the right data management and automation tools are not in place. While those initiatives in many ways are still in early stages, more companies are looking to apply AI/ML to optimize operations, increase performance, and provide differentiated customer experience.

In fact, AI/ML is a priority going into 2021 for the majority of IT executives. But the size of sprawl of application data is a huge challenge as data resides across a number of different places, including customer-facing applications (48%), ERP systems (19%), and financial applications (19%).

Survey findings show four of the top five blockers to implementing AI/ML initiatives involve data. They include:

Data accuracy (54%)
Data access (44%)
Protecting personal and sensitive data (43%)
The time it takes to refresh data in models (36%)

To overcome these challenges, automation is key to removing manual data delivery, refresh, and security processes that block innovation. A programmable data infrastructure enables data to be automated and managed via APIs. The characteristics of a programmable data infrastructure include:

API data access and refresh
Automated discovery and masking of sensitive data for compliance risk mitigation
Immutable data time machine for a continuous record of source data changes that delivers near real-time data, plus historical data
Versioning of source and training data for concept drift analysis
API-first approach to integrate data operations with AI/ML tools

For example, one of the world’s top engineering firms in the oil and gas industry is using programmable data infrastructure (PDI) to deliver AI-driven insights and solutions across its global plant facilities. The company’s goal is to boost risk management, operational efficiency, and support real-time decision-making and execution.

With programmable data infrastructure, this billion-dollar business is efficiently sourcing data spread across disparate systems and locations, spanning the globe from North and South America to Africa, the Middle East, and Asia. Teams are able to effectively import training data and deploy machine learning models in the cloud. PDI is also allowing the firm to continuously and efficiently deliver fresh data to a virtual database on a near real-time basis, creating a flexible approach to data and giving users access to primary data for AI that sources from their most critical business systems.

Historically, the larger the company, the slower the pace of change. While disruptive companies, like those born in the cloud, often have the advantage of speed of new technology adoption, the incumbents have the massive volumes of data to feed AI/ML initiatives and the resources to hire teams of data scientists to make the most out of that data.

This changes the playing field as larger companies can take advantage of adopting AI/ML faster to not only keep current customers, but also pivot faster to new ways of interacting with supply chains and partners. Even when the world does open again, digital first will prevail, so investments in data automation for AI/ML initiatives will continue to pay off.

The DevOps Data Platform

Agile, DevOps, CI/CD

Modernization to Multicloud

Data Compliance & Security

Resource Center

Events & Webinars

Blog