Why Testing in EKS Can Drain Your Budget
If you’ve managed a testing environment in Amazon EKS, you already know how easily costs are adding up. I’ve been there—clusters spun up for a quick test and then left running overnight, oversized nodes consuming way more CPU and memory than needed, and load balancers forgotten after a test run, racking up unnecessary charges.
Testing environments are essential, but if left unchecked, they quickly become one of the biggest sources of cloud waste.
The Challenges of Testing Environments in EKS
So, why do Kubernetes testing environments burn through budget?
- Always-on clusters: Test environments often run 24/7, even when no tests are running.
- Over-provisioned nodes: Engineers tend to request larger instance types just in case, even when smaller ones would work.
- Orphaned resources: Unused volumes, load balancers, and IP addresses continue generating costs long after tests finish.
- Inefficient scaling: Many teams use static resource allocation instead of autoscaling, leading to wasted compute power.
The good news? AWS provides all the tools you need to fix these inefficiencies—you just need the right strategy. Here’s how to optimize your EKS testing environment to keep your cloud costs in check without sacrificing performance.
1. Right-Size Your Worker Nodes (Instead of Guessing)
A huge mistake in Kubernetes testing environments is running worker nodes that are way too large for the workload. Many teams deploy c5.4xlarge or r5.8xlarge instances without checking if their test workloads actually need that much compute or memory.
Example: Finding the Right Instance Type for Testing
Instead of choosing large instance types blindly, follow these steps:
Analyze your workload usage using AWS Compute Optimizer:
- Go to AWS Compute Optimizer.
- Select your EKS worker nodes.
- View recommendations on downsizing instances based on actual CPU and memory usage.
Test with smaller instances:
- If Compute Optimizer suggests m5.large instead of r5.4xlarge, create a test node group with the smaller instance and run sample workloads.
- Use
kubectl top pods
to see real CPU/memory usage.
Gradually scale up if needed:
- Start small and scale up as necessary instead of defaulting to large nodes.
- Adjust using Karpenter which dynamically selects the right instance type based on real usage.
💡 Savings estimation: Switching from an r5.4xlarge ($1.00/hour) to an r5.large ($0.12/hour) for 5 test nodes could save over $4,000 per month.
2. Use Spot Instances for Non-Critical Tests
EC2 Spot Instances can save up to 90% compared to On-Demand pricing. But many teams avoid them because of potential interruptions.
The truth? For most test workloads, interruptions don’t matter—you can retry failed tests or use a mix of Spot and On-Demand instances for reliability.
Example: Running Test Workloads on Spot Instances
Create an Auto Scaling Group (ASG) with mixed instances:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: test-cluster
region: us-east-1
managedNodeGroups:
- name: test-nodes
instanceTypes: ["m5.large", "m5.xlarge"]
spot: true
minSize: 2
maxSize: 10
- This creates an EKS-managed node group with Spot Instances, keeping at least 2 always available but scaling up to 10 when needed.
Use pod disruption budgets (PDBs) to prevent failures:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: test-pdb
spec:
minAvailable: 50%
selector:
matchLabels:
app: test-runner
- Ensures that at least 50% of test pods remain available even if Spot Instances are interrupted.
💡 Savings estimation: Using Spot Instances for test runs instead of On-Demand can cut costs by 70–90% per instance.
3. Enable Kubernetes Autoscaling
Static clusters waste money because they don’t scale down once your tests finish. Rather than maintaining a fixed set of nodes, leverage Kubernetes autoscalers like Karpenter and Cluster Autoscaler (CAS) to dynamically adjust resources based on actual usage.
Autoscaling with Karpenter
Karpenter is an open-source autoscaler built specifically for Kubernetes that rapidly adjusts your cluster’s compute capacity. It automatically adds nodes when workloads demand more resources and removes nodes to save costs when those resources are no longer needed. With Karpenter, your infrastructure is both agile and cost-effective, responding quickly to workload changes without manual intervention.
Autoscaling with Cluster Autoscaler (CAS)
Another powerful solution for Kubernetes autoscaling is the Cluster Autoscaler (CAS). CAS integrates directly with popular cloud providers like AWS, GCP, and Azure. It monitors pods that fail to schedule due to insufficient resources and dynamically adjusts the number of nodes to meet these requirements. This ensures you only pay for the resources your workloads actively need, significantly reducing wasted expenditure.
Automate Pod Scaling
Beyond managing nodes, it’s important to autoscale workloads themselves. Kubernetes provides built-in Horizontal Pod Autoscaler (HPA) to scale pods based on resource usage or custom metrics. Setting appropriate minimum and maximum replicas ensures resources scale automatically based on actual workload demands, greatly reducing cloud waste.
4. Set Resource Requests and Limits to Avoid Over-Provisioning
Example: Optimizing Resource Requests
apiVersion: v1
kind: Pod
metadata:
name: test-runner
spec:
containers:
- name: test-runner
image: mytestimage
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
memory: "512Mi"
- Requests: Allocates minimum needed resources for predictable scheduling.
- Limits: Prevents the pod from hogging all available resources in case of unexpected demand
💡 Real savings: Avoids excessive resource allocation, which means you don’t pay for unused CPU/memory.
5. Shut Down Test Environments When Not in Use
If your test cluster runs overnight/during specific work hours when no one is testing, you’re literally burning money.
Example 1: Using autoscaling groups for stopping cluster
These policies with stop the autoscaler first one stopping at 00:00, the second one starting back up at 6:00.
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name "YOUR-ASG-NAME" \
--scheduled-action-name "StopEKS" \
--recurrence "0 0 * * *" \
--min-size 0 --desired-capacity 0 --max-size 0
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name "YOUR-ASG-NAME" \
--scheduled-action-name "StartEKS" \
--recurrence "0 6 * * *" \
--min-size 1 --desired-capacity 2 --max-size 3
Example 2: Schedule it to stop running during work hours using eksctl
These policies using cron will completely delete and create the cluster during 8 am-8pm
# Create cluster at 8 AM 0 8 * * * eksctl create cluster --name test-cluster --region us-west-2 --nodegroup-name test-nodes --nodes 2
# Delete cluster at 8 PM 0 20 * * * eksctl delete cluster --name test-cluster --region us-west-2
💡 Real savings: A 6-hour nightly shutdown can reduce monthly EKS costs by up to 25%.
6. Clean Up Unused EBS Volumes, Logs, and Snapshots Using Lifecycle Policies
Unused resources like EBS volumes, logs, and snapshots quickly add hidden costs. Teams often overlook these “orphaned resources” that linger after test runs.
Example: Automating Cleanup with Lifecycle Policies
Set up AWS lifecycle policies using AWS Data Lifecycle Manager.
- Automatically delete snapshots after a defined retention period (e.g., 30 days).
- Schedule regular cleanup tasks to remove unused EBS volumes attached to terminated instances.
- Configure CloudWatch log expiration policies to remove test logs after a specific timeframe.
💡 Real savings: Regular cleanup can reduce your storage costs by 20–40% monthly.
7. Run Short-Lived Tests on AWS Fargate Instead of EC2 Instances
For quick tests that run just a few minutes or hours, AWS Fargate is ideal—no need to manage underlying EC2 infrastructure.
Example: Deploying Tests on Fargate
Set up Fargate profiles on EKS:
- Configure specific namespaces for short-lived testing workloads.
Schedule your test workloads:
- Deploy jobs or cron jobs that spin up pods on-demand, run the tests, then terminate automatically without lingering infrastructure.
Here’s an example YAML snippet using nodeSelector
to ensure pods run on Fargate:
apiVersion: batch/v1
kind: Job
metadata:
name: short-lived-test-job
namespace: fargate-tests
spec:
template:
metadata:
labels:
app: short-lived-test
spec:
containers:
- name: test-container
image: mytestimage
command: ["/bin/sh", "-c", "run-tests.sh"]
restartPolicy: Never
nodeSelector:
eks.amazonaws.com/compute-type: fargate
💡Savings estimation: Fargate ensures you only pay for exactly the compute time you use, eliminating idle resource costs.
8. Set Up AWS Savings Plans for Predictable Tests (e.g., Nightly Runs)
For predictable, scheduled testing workloads like nightly test suites, AWS Savings Plans offer significant discounts compared to on-demand pricing.
Example: Leveraging AWS Savings Plans
Analyze historical usage patterns using AWS Cost Explorer.
- Identify consistent workloads that run regularly.
Purchase Savings Plans based on predictable usage:
- Commit to a certain compute usage over 1- or 3-year terms.
💡 Savings estimation: Savings Plans can cut your predictable compute costs by up to 72% compared to On-Demand rates.
Sign up for our weekly tips email—no spam, just our dev team sharing their best insights to help you work smarter and more efficiently.
Small Tweaks, Big Savings
Testing in EKS is necessary, but it shouldn’t be wasteful. By right-sizing nodes, using Spot Instances, enabling autoscaling, setting resource limits, and scheduling shutdowns, you can cut costs without sacrificing efficiency.