With our latest release, the Compliance Engine, Delphix now supports deployment into Amazon Web Services (AWS) and other private/public cloud environments. In this blog series, I will talk about my experiences with AWS and discuss how Delphix can help customers move into AWS and make the most out of their existing AWS deployments.
AWS is the second major platform we deploy into after ESX, and we wanted to understand the performance characteristics of the AWS ecosystem. In this blog post, I will talk about experiments I ran on AWS, to specifically evaluate the block level storage options available.
IO Report Card
Delphix deploys as a virtual appliance and relies on underlying storage to sustain the aggregated load from the databases virtualized by an engine. We developed a tool called the "IO Report Card," to evaluate the storage provisioned to Delphix. The report card, based off of open-source filesystem benchmark FIO, runs synthetic workload and generates letter grades based on latency and throughput characteristics. Here is a sample report card from one of our customers who use an All Flash Array for their Delphix Engines. Here is one from a customer who use a Tier II Storage array. Note that this array has a Tier with Flash drives, backed by high capacity spinning disks. The report card is run prior to every Delphix deployment, to ensure that the storage provisioned is adequate.
Delphix deploys as an EC2 instance on AWS using EBS volumes for storage. AWS provides three options for Elastic Block Storage (EBS) storage volumes:
- General Purpose (GP) SSD
- Provisioned IOPS
Standard volumes come with 100 IOPS per volume, GP volumes provide up to 3 IOPS/GB of storage, capable of sustaining burst of 3000 IOPS per Volumes. Provisioned IOPS volumes can sustain up to 30 16KB IOPS per GB per volume. GP Volumes were recently added (in the middle of my experiments) and very little is known about their characteristics. That is why I spend a disproportionate amount of ink on that option.
AWS' SLA for IOPS are actually a throughput guarantee. If we provision 1000 IOPS, we are guaranteed 16MB/sec of throughput. Using smaller IOs can provide higher IOPS, as IOs could be subsequently coalesced.
I used three Delphix engines with the following configuration.
GP SSD (9K IOPS)
Provisioned IOPS (10K IOPS)
EBS Volumes have a first access penalty. It is preferred to initialize all the blocks using DD or some such utility to produce predictable IO performance.
The full report cards can be found here: Standard Volumes, General Purpose SSD and Provisioned IOPS. Most of the results were inline with expectations and matched the SLAs. In this post, I'll discuss a few interesting observations from these experiments.
One of the most important workloads to evaluate IO performance for a database application is 8KB random reads. In this test, the tool generates 8KB reads to random offsets across all the devices using 16 threads. The charts below are snippets from the IO Report card results. The 'latency' chart shows the average latency and the 95th percentile latency over the Grade scale. The grades are based on 95th percentile latency, and anything above 14ms get a 'D'. The 'histogram' chart, shows the latency histogram while the 'scale' chart shows how the 95th percentile latency scales when load increases, from 16 to 32 threads.
Figure 1: Standard Volumes, 8KB Random Reads
Figure 2: General Purpose SSD Volumes, 8KB Random Reads
Figure 3: Provisioned IOPS Volumes, 8KB Random Reads
In this world, you get what you pay for. -Kurt Vonnegut, Cat's Cradle.
As expected, the report card grades for Standard Volumes were not very impressive. Provisioned IOPS are essentially latency guarantees for a given number of IOs. Since I did not 'provision' any IOPS, I expected the latency to be sub-par, and that is what we see. Average latency of ~30ms with outliers around 90ms. It is clear that this storage cannot support most database workloads.
Standard volumes can deliver higher random read IOPS, if smaller block requests (4KB, 8KB) are issued that can be coalesced downstream by EBS
Provisioned IOPS Volumes show a grade similar to a high-end flash array, with 95th percentile outliers under 3ms. The latency histogram shows an even impressive percentage of responses under 2ms. The load scaling graphs also shows good scaling as the load doubles.
When adequate IOPS are provisioned, Provisioned IOPS Volumes show performance characteristics similar to an all flash array.
General Purpose SSD Volumes' results were mixed. Average latency of 7ms is in-line with a typical Tier II SAN or NAS Storage. But the histogram shows a bi-modal distribution. There is a cluster around 1ms, and another one around 20ms. This either indicates storage tiering or some kind of throttling. The working set size of my workload is 256GB and the histogram shows that only around 50% of the total responses are within 2ms. Given this data, we may get good performance for small or lightly loaded databases. But for larger databases or databases with IO intensive workloads, it is difficult to get good, consistent performance from this particular storage option.
Given their price-point, the 'GP SSDs' are a viable option for lightly loaded Databases.
DSS or Batch Workload
Figure 4: Read Throughput vs. Load
Data Analytics, Reporting and Batch workloads are typically throughput bound -require large data movement. In order to evaluate the performance of such workloads, the IO Report card generates 1MB read requests to sequential offsets. The tool scales load by increasing threads from 4 to 64. The chart below shows the total throughput observed as load increases. Given that Standard Volumes are backed by spinning disks, I expected them to fare better on sequential workloads and that is what we see here. It is surprising that GP Volumes are not able to sustain higher sequential throughput. In all these cases, the workloads were bound by IOPS. Note that the Provisioned IOPS volumes are able to drive ~300MB/sec of throughput, this is almost 2x their SLA.
For GP and Provisioned IOPS Volumes, SLAs hold for either sequential or random workloads.
The report card uses small (1KB) writes to mimic a typical log writer and large (128KB) to mimic a log writer under heavy load and batching writes. With the large writes test, all the volumes are throughput limited and results matched the SLAs. The attached report cards have the latency distribution for the three volume types.
Figure 5: Latency Histogram for 1KB Sequential Writes on Standard Volumes
Small block write tests were also IOPS bound, but showed an interesting latency distribution between Standard and GP Volumes. Standard volumes fare well on small block writes. Many small block requests are coalesced into a single large block requests, producing higher effective IOPS. Also, since these requests are all to sequential offsets, spinning disks fare well on the. It is also possible that the Standard Volumes have a write back cache layer cushioning the writes before they reach spinning disks. Though we do not have enough information to confirm that.
Figure 6: Latency Histogram for 1KB Sequential Writes, on GP SSD Volumes
For GP Volumes, the latency distribution is bi-modal. Around 60% of writes were serviced in under 1ms, while the remaining 40% took upwards of 10 ms to be serviced. This again points to the burst capacity SLAs for GP Volumes. Only a portion of the writes were serviced by fast storage, others were queued up when the burst could not be sustained anymore. If your workload is especially write intensive, you are in luck. EBS volumes are fairly generous in-terms of write performance per purchased capacity. I repeatedly got higher than provisioned capacity.
Why are GP Volumes sometimes slow compared to Standard Volumes?
After digging into the data a little bit and a few extra runs I realized the problem with GP Volumes is bursty IOPS. GP Volumes are built to sustain bursts of 3000 IOPS per volume. So with 3 devices, we should have 9000 IOPS total. But that SLA is only for 'burst capacity', meaning the volume will sustain those IOPS only for short bursts. In fact my experiments showed that the IOPS start dropping off after a few minutes of load. The sustained IOPS SLA for GP Volumes is actually 3 IOPS / GB of data, in our case ~2250 16KB IOPS. During my tests, I observed that initial tests get higher IOPS and better latency. The IOPS then plateau at 2250 16KB IOPS. It is not clear how many bursts can be sustained or for how long.
GP Volumes sustain 3000 IOPS per volume for bursts of < 5 minutes, they revert to 3 IOPS/GB after that.
The latest release of Delphix deploys in AWS and I wanted to understand the performance characteristics of the storage (EBS) options available in AWS. I ran synthetic workload on the three types of EBS volumes available. Here are some observations I could draw from this work.
- DB Applications running IO Intensive workload will see sub-par IO Performance if using Standard Volumes
- General Purpose SSD Volumes have good price/performance and are an excellent option for small, low load databases. But they are inadequate for large (> 250GB) or IO intensive applications.
- For OLTP or DSS style workloads, it is preferable to use volumes with Provisioned IOPs.
The following formula shows how many provisioned IOPS are required for your application. The components can be obtained from an AWR report.
Optimal IOPS = physical read total IO requests+physical write total IO requests+redo writes
After talking to an EBS expert at the AWS pop-up loft, I realised general practice for DB Applications on AWS is to 'maximize Provisioned IOPS' based on capacity. So customers typically end up provisioning as many IOPS as they can provision for their storage (at 30 IOPS/GB). The main reason for this is ofcourse, predictable performance, but also migrating data to a Provisioned IOPS volume from Standard or GP Volume is tedious and requires downtime. It is recommended to provision the necessary IOPS prior to deployment. In my next post I will talk about the next phase of my work on Delphix performance in AWS.
P.S: I understand there are unanswered questions in the data I collected. I will continue to analyze the data. I welcome and appreciate any feedback you can provide with fresh eyes.