Application Development

Redundant Data May Be Hurting Both Your Bottom Line and the Environment

Updating test data management processes can reduce a business’s storage footprint, IT budget spend, and greenhouse gas emissions

Jason Axelrod

Feb 13, 2024

How many copies of test data are currently floating around your organization’s non-production environments? 

That’s not always an easy question to answer, as not all copies of test data within enterprises are monitored or even accounted for. In fact, many businesses maintain non-production environments in which a striking 90% of the data is redundant, according to Delphix. These unaccounted data copies within non-production environments can make enterprises vulnerable to cybertheft. Non-production environments are often less secure than production environments, making them treasure troves for hackers seeking to steal customer data.

Redundant copies of test data floating around an enterprise can also incur millions of dollars in needless storage fees while contributing to thousands of pounds of carbon dioxide (CO2) emissions and greenhouse gas emissions. This data sprawl often results from outdated test data management practices. Updating test data management (TDM) practices presents a practical and efficient method for businesses to minimize this redundant data, which in turn slashes IT budgets and bolsters sustainability practices.

The Costs of Redundant Storage

For many businesses, redundant data copies add up to comprise a sizable portion of their total data footprint. On average, only about 15% of a business’s total data is business-critical, while approximately 33% of a company’s data is redundant, obsolete, or trivial, according to enterprise technology market research firm Vanson Bourne

The costs for storing this redundant data adds up. It costs an average of $5 million to store 1 petabyte (1,000 terabytes, or 0.001 exabytes) worth of data, per data management firm Veritas. So, if a business holds 1 petabyte of data, it’s wasting about $1.6 million on storing useless data.

These costs only increase for businesses that use dedicated remote or on-premises data storage, sometimes known as the private cloud. Powering just one server rack in a data center for a year can cost companies up to $31,988, according to datacenter infrastructure management firm Nlyte Software. Many enterprises use hundreds to thousands of server racks, driving annual powerage costs well into the millions and tens of millions of dollars.  

The Bigger Picture of Sustainability

Redundant data eats up more than just storage dollars; it also incurs major environmental costs. 

It takes about 30 kilowatt hours (kWh) of energy per day to power the average house— but each gigabyte of data transfer costs between 3.1 to 7 kWh of energy, or roughly 5.05 kWh on average, per data from the American Council for an Energy Efficient Economy and a Carnegie Mellon University study. This equates to about 4.81 lbs (2.18 kg) of carbon dioxide (CO2) equivalent, according to the U.S. Environmental Protection Agency (EPA).

These numbers soar when applied on an enterprise scale, as enterprises can easily house thousands of terabytes. Based on the metrics above, provisioning a 10 TB (10,000 GB) database produces about 50,500 kWh of electricity, which is 48,162 lbs, or 24.08 tons (21,846 kg) of CO2 equivalent. 

It can be tough to visualize the impact of 48,162 lbs of CO2 equivalent on our environment. For comparison, provisioning a 10TB database produces the same amount of greenhouse gas emissions as driving 56,003 miles (90,128 km) in a gasoline-powered passenger vehicle and driving 4.9 gasoline-powered passenger vehicles for one year, per EPA data. It also produces the same amount of CO2 emissions as burning 24,471 pounds (11,099 kg) of coal, powering 4.3 homes with electricity for one year, consuming 50.5 barrels of oil, and using 1,004 propane cylinders for home barbecues. 

Outdated TDM: The Culprit Behind Redundant Test Data

Offsetting this same amount of emissions would require recycling 946 trash bags (about 7.8 tons, or 7.07 metric tons) of waste, rather than placing it in a landfill. And data storage audits can get both expensive and time-consuming. Updating outdated TDM practices within your IT organization proves to be a much easier, quicker, and more lucrative way to reduce your carbon footprint and storage budget than visiting your local landfill.

Effective TDM has long been an essential process for enterprises to release effective software into production environments. Yet while DevOps and automation are accelerating software development cycles daily, many businesses are still having multiple teams use the same manual, high-touch processes to handle TDM as they did in decades prior. 

However, adopting a highly evolved set of practices known as DevOps test data management (DevOps TDM) can accelerate businesses’ TDM processes, allowing them to catch up with the rest of their development processes. DevOps TDM accomplishes this primarily through one its core technologies: data virtualization.

The Promise of Data Virtualization

One of the most inefficient parts of working with test data is the common practice of circulating physical data copies within production environments. Data virtualization combats these inefficiencies by creating extremely lightweight copies of data for distribution that retain the characteristics of the original versions of data. Data virtualization accomplishes this by interfacing with blocks of physical data stored on a given server and creating virtual database “pointers” to these data blocks, So, instead of circulating physical copies of databases, virtual databases can be quickly and efficiently distributed to developers and testers on-demand.

Compared to physical databases, virtual databases can be provisioned and dismantled in a much easier and quicker fashion. WIth the right approach, virtual databases can be used throughout a development project, with their data being refreshed from the original physical database as new data materializes. To support proper testing, virtual databases remain fully readable and writable, allowing developers to obtain the same degree of production-grade utility from them as they’d get from the production data sources. 

The transient nature of virtual data holds a number of benefits over “physical” data. As virtual pointers to data stored on a separate server, virtual data dramatically reduces a business’s storage footprint compared to using physical copies. Even a plethora of virtual data copies consumes far less space than far fewer copies of physical data. 

To illustrate, virtualizing a 10 TB (or 10,000 GB) database converts it into a roughly 50 GB virtual copy. This amounts to an approximately 99.5% difference in size. Those storage savings add up— an IDC survey of Delphix customers found that using virtualization as part of a DevOps TDM solution let companies reduce their data footprint by 82% on average, going from an average of about 1,593 terabytes to 281 terabytes. 

Delphix’s Continuous Data solution, offered as part of the Delphix Data Platform, leverages data virtualization to decrease data footprints by 10x, while accelerating provisioning by 100x. It also provides a comprehensive set of APIs, CLIs, and UIs to manage all data operations in any environment. Reach out to us for more information on how Delphix can help transform your business’s data operations.