Optimize Cost Efficiency for Storage on AWS Batch Workloads

Read More >

AWS Batch is a service offered by Amazon Web Services (AWS), which facilitates the execution of hundreds of thousands of batch computing tasks on the AWS platform. By intelligently distributing compute resources based on workload volume and resource demands, AWS Batch effectively reduces costs associated with batch processing.

However, striking the right balance between performance and cost efficiency for these batch workloads is inherently complex, especially since higher performance invariably means more servers, which prominently affects storage costs. While high performance is essential to quickly and efficiently process batch jobs, it is equally vital for companies to streamline resource utilization and curb unnecessary expenses.

In this blog post, I delve into the complexities of optimizing storage for AWS Batch workloads. I’ll share the strategies and insights that enabled us to significantly enhance performance, boost cost efficiency, and slash our company’s Elastic Block Storage (EBS) expenses by more than 50%.

 

The Challenge of Optimizing Scale in Batch Workloads

Although AWS Batch can effortlessly handle a range of intensive computational tasks on a large scale, utilizing EBS for short-lived storage requirements during batch processing can be extremely inefficient. This challenge intensifies further when running a high volume of EC2 instances, which can incur high costs for EBS volumes.

AWS Batch automatically scales your compute resources based on the volume and complexity of your jobs. However, this can sometimes lead to over-scaling or under-scaling of EC2 instances, depending on the nature of the jobs. Over-scaling can lead to unnecessary costs, while under-scaling can result in performance issues. As each compute node usually uses EBS volumes, the extra storage cost that is related to storage overprovisioning of EC2 instances can significantly inflate the overall expenses of running batch workloads.

As I started searching for potential solutions to address these challenges, I found several AWS services designed to help optimize storage management for AWS Batch readily available. Among these solutions is the Amazon Elastic File System (EFS), a fully managed and scalable shared file system. It automatically adjusts storage capacity based on the addition and removal of files. Another native storage solution that is offered is EBS which is AWS’s block storage solution for EC2.

After thoroughly evaluating each storage service, I chose the block storage option – EBS. The first challenge I saw is that our batch processing works in bursts and that it’s unclear how much storage capacity should be provisioned per EC2 instance. After a week of usage, I checked the %Utilization of the volumes and saw that even though utilization tends to reach 100% for short periods during batch processing, it averages around 30%. In our case, that meant 4TB per instance X 1000 instances, which resulted in millions of dollars wasted every month. It became clear that I needed a more innovative and comprehensive solution. That was when my research brought me to Zesty Disk.

 

Zesty Disk: A Comprehensive Solution For Storage Management

Zesty Disk is a next-generation auto-scaler for cloud block storage volumes. It has the unique ability to shrink and expand storage in real time, boosting IOPS performance and reducing costs by up to 65%. This is accomplished by converting large file system volumes into a virtual disk composed of several smaller volumes, which facilitates the file system’s automatic expansion and contraction:

Figure 1: Zesty Disk’s expand and shrink technology

 

Essentially, Zesty Disk decouples the file system from the underlying infrastructure, replacing the EBS file system with a virtual disk that adapts to the data load. This is a key feature of Zesty Disk, as it supports application stability by ensuring sufficient disk space at all times.

Zesty Disk automatically adjusts your storage capacity to match real-time application needs, effectively eliminating the risks associated with over-provisioning and under-provisioning resources. This ensures the highest level of utilization possible. By doing this, Zesty Disk facilitates cost-optimization by ensuring that users pay only for the storage capacity they actually use.

In addition to its dynamic storage management capabilities, Zesty Disk maximizes EBS volumes performance. It does so by leveraging multiple smaller-sized storage volumes, each with its own burst capacity.

Let’s zoom in on that last point I made, as it is crucial to understanding the scope of what Zesty Disk can do for you. In traditional storage setups, you might have a single large volume handling all your data operations. However, this can lead to bottlenecks and performance issues as the volume of data increases. With Zesty Disk, your data operations are spread across multiple volumes. This not only distributes the load but also allows each volume to use its own burst capacity, significantly improving overall performance.

This comparison between traditional storage and storage with Zesty is illustrated below:

 

Figure 2: The impact of Zesty Disk on IOPS performance

 

In the next section, I’ll delve into a real-world use case that demonstrates the effectiveness of Zesty Disk in managing storage for AWS Batch workloads.

How Zesty Disk Transforms Storage Management

Our company uses AWS Batch to launch EC2 clusters for processing its machine learning algorithms. The constant flow of data creates the challenge of processing a large volume of ingested data in a short period of time. Although there were fluctuations in traffic volume, the overall volume saw an upward trend, regularly reaching maximum capacity.

Since manually expanding capacity was not a feasible solution due to the sheer number of single servers, I wanted to reduce the cost of our EBS by keeping the volume size to a minimum throughout the data ingestion cycle. Overall, the net EBS usage was at 30%-40%, generating significant unnecessary expenditure.

To address these challenges, I deployed Zesty Disk’s virtual file system. This innovative solution automatically expanded our EBS volumes in real time to meet our data ingestion needs. For each of our 70,000 machines, a behavioral profile was created based on the usage metrics and metadata of each EBS volume. Zesty Disk’s AI model analyzed this data and executed changes according to the established profile. A buffer amount, usually 10%-15% of capacity, was based on previous trends and continuously maintained above what was needed to avoid running into insufficient capacity.

The results were remarkable. Before using Zesty, I was pre-allocating a standard size of 1-4 TBs per EC2 instance, with a net usage of only 30%-40% of the available capacity. Once deployed with Zesty, our provisioned volume size was reduced by half, sometimes even more. With it, of course, our EBS cost was also cut by 50% as well. However, this reduction in the consumption footprint of EBS resources not only enabled greater cost-efficiency for our company but also reduced the total cost of AWS services required. Most notably, the solution minimized the company’s carbon footprint.

A New Era of Cloud Optimization

Managing storage for AWS Batch workloads presents a unique set of challenges. Balancing performance and cost efficiency, avoiding over-provisioning and under-provisioning, and managing the complexities of using EBS volumes are all critical aspects that must be addressed.

Zesty Disk offers a powerful solution that not only improves performance and cost-efficiency but also simplifies the management of storage resources. Whether you are a developer, scientist, or engineer, adopting Zesty Disk for your AWS Batch workloads can lead to significant benefits.

As you navigate the digital transformation era, solutions like Zesty Disk will be crucial in helping your business optimize its cloud resources.

Start your journey with Zesty Disk today and revolutionize how you manage storage for your AWS Batch workloads.