Finding the Right Balance Between Cost and Performance in Your Public Cloud

Read More >

With 95% of organizations reporting cloud cost optimization as a top priority in 2023, it’s clear that businesses are struggling to reduce cloud spend, despite its growing importance in the midst of a difficult economy.

As our organizations attempt to get a better handle on cloud budgets, us DevOps engineers are tasked with keeping track of varying costs for different cloud services, configurations, usage patterns, and more in order to ensure our cloud usage is as cost efficient as possible. To make matters even more difficult, many  cloud cost optimization techniques have potential to decrease performance in specific use cases. 

For example, compute optimized instances might come with less memory instead of more CPUs than general purpose instances, making them cheaper, but not faster. Reserved Instances can save you money if you know how much you need and when; but there are also Spot Instances where you can save money on workloads that are interruptible and not time critical, but they should never be used on mission critical workloads or performance will suffer.

The few times I was tasked with cost optimization, I started reviewing our workloads and realized we could scale down quite a bit. But hitting that sweet spot—where you’re cutting costs without slowing your systems down too much—is no trivial task. Understanding the tradeoffs between cost and performance and following best practices to optimize that monthly infrastructure bill is a crucial skill that is difficult to master.

Here are some best practices I’ve learned on the job.

Cost-Performance Tradeoffs

When attempting to optimize cloud infrastructure costs and performance, businesses often encounter tradeoffs that require careful consideration. This involves striking the right balance between user experience and controlling the cloud resources utilized to achieve the level of performance your customers expect.

Let’s take a look at some of the cost-performance tradeoffs organizations may face.

Resource Allocation

Cloud infrastructure relies on CPU, memory, storage, and network resources. Important questions to consider when it comes to optimal allocation include:

  • What type of application is deployed?
  • How well is the instance tuned for the application and its environment?
  • How often are instance families or tiers upgraded and updated?

For instance, running a blockchain on a high-end CPU rather than a GPU may cost less in the short term but be far more costly in the long run, since you’d have to provision a high CPU count to keep up with the chain’s demands. Some blockchain networks, on the other hand, use hashing algorithms that don’t work well with GPUs. So optimizing for a specific resource is very much application dependent.

Scaling of Resources

With the cloud, you can rent resources for very short periods to scale your infrastructure up and down depending on the current demand. But you must ensure you’re using reasonable metrics for scaling. Simply checking if the CPU load is over capacity isn’t enough; you also need to check how much over capacity it is and for how long.

The first time I naively added auto-scaling to a system, it started a second instance once a certain number of requests had been exceeded, in order to ensure quick and reliable performance. But ultimately, the traffic spikes only utilized the new instance for 20% while the costs for running the application server had doubled.

Selecting Data Centers and Connectivity

AWS’s most prominent data centers usually have the lowest service costs and better connectivity and infrastructure support. In some cases, however, a country may have regulations that require you to keep its citizens’ data within their borders—so you may simply be left with no choice.

For example, if you’re concerned about latency when hosting a streaming or gaming service, your choice of regions is limited to your customers’ locations.

I’ve ended up having to migrate data between regions due to these regulations. This wasn’t fun, nor was it cheap. Moving data between regions can be costly, complex, and time consuming. For example, AWS doesn’t support the same services in all regions, and pricing can vary between regions. So moving to another region could increase your costs or require replacing certain services.

Service-Level Agreements

Service-level agreements (SLAs) are the promises a cloud provider makes about service availability in their contract. These promises don’t mean service will be available for the indicated uptime, but the provider will reimburse you if it drops below a specific threshold. 

This means if you pay $100 per month for an instance that makes you $10,000 a month, you might have a problem because on a 5% downtime, we get $5 back, not $500.

Still, these numbers are no guarantee and should be weighed against your actual losses which you can calculate once you get your bill.

Cost and Performance Optimization Strategies

Let’s look at optimization strategies for the different resource types.

CPU and Memory

If you’re dealing with CPU and memory-intensive tasks—like video processing, scientific simulations, or machine learning—you’ll want to choose compute or memory-optimized instances to maximize your savings. The compute optimized instances come with lower memory than their respective general compute counterparts (e.g., M5.large instances come with more memory than C6.large instances), making them a little cheaper.

If your workload is ARM compatible, it might be worth checking out AWS’ Graviton2 instances, which can be up to 40% less expensive than the more common x86 instances with equivalent performance.


If your applications are I/O bound for storage reads and writes, choosing a storage-optimized instance is the way to go. They allow for high-performance SSDs.

Assuming storage space is your concern and you want to ensure you have enough to accommodate peaks in demand while not overpaying, I’d recommend using a service that can automatically scale the storage capacity up and down according to real-time demand. This way, you always have the exact amount of storage you need to keep applications performant and can avoid paying for storage you’re not using.

Network Bandwidth

When hosting a web application or video streaming, use an instance type with high network bandwidth to avoid slow network performance and a degraded user experience. AWS offers instance types with increased network performance, among them M6in, C6in, R6in. Using a M6in instance is around 45% more expensive than using a plain M5 general compute instance with a regular network connection.

Operating System and Software Requirements

Be sure to select the instance type that supports your workload’s OS and related software. If your business is running a legacy application that only works with a specific software or operating system version, choosing an instance type that does not support this version will lead to compatibility issues and prevent the application from running correctly. It may even lead to performance degradation due to environmental inconsistency.

Using Autoscaling

Auto-scaling is a powerful feature of EC2 but should be used with caution, as mentioned before.

When I implement auto-scaling these days, I look at the number first to check the following factors:

  • How fast can my instances start and stop?
  • Does the system have short spikes?
  • Does the system require more instances over a prolonged period?
  • What resources limit the system, and are they always the same?


Starting an instance that takes minutes to get up won’t help with short spikes. And if your excess load is only 10 to 20% over your current instance’s capacity, starting a second instance of the same type would solve your performance issues, but also double the cost while the new instance was 80 to 90% idle.

Sometimes it’s better to use a bigger instance type to begin with so you never need to scale up. You can also configure instances of different types. For example, if your first instance is of type m5.8xlarge but the others you start when capacity is exceeded are just of type m5.2xlarge, the scaling can be executed in smaller increments.

Implementing Monitoring

I follow a top-down approach. I tag all my resources in a way that makes sense for my project or organization. That way, I know who is responsible for each expense and where to start with optimizing costs. AWS Cost Explorer is great for this.

Once I know where to start, I consult AWS Trusted Advisor, which recommends any best practices I might still need to implement for my workload. While it’s not applicable for all types of workloads, AWS Trusted Advisor has helped me find “easy wins” for cutting costs without negatively impacting performance. Finally, I look closer at the biggest spenders with Amazon CloudWatch. Sometimes the issue is an oversized instance. In this case, I look at the numbers. If I see the instance isn’t appropriately utilized, I scale it down.

It’s crucial to know the instance types of your provider, their tiers, and the exact utilization. On AWS, scaling between tiers of the same instance type usually doubles or halves properties like CPUs and memory with every step. If your workload uses 6 of 8 CPUs, scaling one tier down would only give you 4 CPUs, which isn’t enough. The memory allocation would be halved too, so you need to keep memory usage in mind. An instance comparison tool can be a huge help here, by allowing you to enter your utilization and see if there are any instances available that fit your workload better.

In other cases, the numbers look like the instance is fully utilized, but the cause is a memory leak or a misconfiguration. That’s why knowing what workloads you’re running is key.

Mastering the Cost-Performance Balancing Act

Finding the perfect balance between cost and performance isn’t easy. I’ve struggled with this more often than I would have liked throughout my engineering career. Wanting to deliver a smooth experience is what I strive for in every project, but I also don’t want to break the bank.

Theoretical optimums aren’t sustainable and often aren’t even required. Sure, getting your response times way below 100ms is nice, but if your users are happy with 100 to 200 ms and you can save money like that, why do it any other way?

That’s why measuring what each application needs is vital. I always ask if the workload is I/O or CPU bound? How much storage it could need? And if it has load, traffic spikes, or periods where it needs more of a specific resource?

Depending on how much time you have at your disposal, automating some of that work can do wonders for your stress levels. Auto-scaling and services can help you find that coveted sweet spot and ensure you always have the resources you need to remain performant—but no more than necessary, so that you stay cost-efficient too. Automation can respond to changes faster than humans, so you can ensure high performance and low costs 24/7.




See for yourself how Zesty uses machine learning to optimize both performance and costs. Schedule a demo with a cloud expert to learn more!