grep isn’t parsing line numbers here. Let me count hits manually from the output — I can see the violations clearly. Here’s the full cleaned article:


Karpenter Cost Optimization: A Practical Guide
Practical techniques for teams running Karpenter in production

Kubernetes cost optimization is not a single setting or a one-time task. It is about finding a sustainable balance between performance, utilization, and operational overhead. The most efficient clusters are not the cheapest ones on paper: they are the ones where applications run well, infrastructure is fully utilized, and the team is not constantly firefighting.

This guide covers six concrete techniques you can apply with Karpenter today, along with an honest look at where manual optimization runs out of road.


1. Control Disruption with Budgets and Schedules

Most cost savings from Karpenter come from consolidation: detecting underutilized or empty nodes and removing them. The tradeoff is that consolidation is inherently disruptive. Pods get evicted and rescheduled, and nodes disappear. Without guardrails, this can destabilize production.

The consolidation policy you probably have in place already looks something like this:

consolidatePolicy: WhenUnderutilized

But Karpenter also supports disruption budgets, which most teams never configure. These let you cap how much disruption happens at once and block it entirely during sensitive windows.

A practical configuration might look like:

  • Limit disruption to 10% of nodes at any given time
  • Define a maintenance window (for example, Monday to Friday, 9am to 5pm) during which zero nodes can be removed
  • Allow more aggressive consolidation during off-peak hours

The result is a system that continuously reclaims waste without putting production at risk during working hours. If your consolidation policy does not include a disruption budget, adding one is the lowest-risk improvement you can make today.


2. Diversify Instance Types: More Than You Think You Need

Karpenter optimizes placement by choosing from the instance types you have made available. The narrower that selection, the less room it has to maneuver.

Many production clusters are locked to a very specific instance type, a C5.xlarge or similar, because it matches the workload profile well. That is reasonable, but it creates two problems:

  • During high-demand periods (think Black Friday or major launches), AWS capacity for popular instance types can be constrained. Smaller regions and AZs are especially prone to this.
  • With a limited selection, bin packing efficiency suffers. Karpenter cannot make good placement decisions when its options are restricted.

A more resilient approach is to allow a range of compatible instance types across families (C, M, R), accept one or two older generations as fallback, and mix spot and on-demand instances within the same node pool.

On spot instances specifically: if you are already running on demand, adding spot to the mix can cut costs meaningfully. The key is building your applications to handle interruptions gracefully, with proper shutdown handling, retry logic, and no sticky sessions. Karpenter can fall back to on-demand automatically when spot capacity is unavailable.

One nuance worth knowing: Karpenter consolidation and bin packing are not the same thing. Bin packing places pods optimally at scheduling time. Consolidation cleans up after the fact. If you have access to the kube-scheduler configuration (managed Kubernetes services generally do not expose this), you can tune bin packing directly. Otherwise, consolidation is your lever, and it works well, just retroactively.


3. Account for DaemonSet Overhead in Your Capacity Planning

DaemonSets are easy to overlook in cost analysis, but they have a direct and consistent impact: every node you launch runs every DaemonSet, regardless of node size or workload.

A typical EKS cluster carries a default set that might include aws-node, kube-proxy, the EBS CSI driver, Fluent Bit, a CloudWatch agent, and monitoring agents like Datadog. In aggregate, this overhead can consume 1–2 GiB of memory and a noticeable share of CPU on every node in your cluster.

This has two implications:

  • Smaller nodes pay a higher overhead tax. A DaemonSet consuming 500m CPU matters much more on a 2-core node than a 32-core one.
  • DaemonSets are effectively priced per node. If you have 200 nodes, you are paying for 200 instances of each DaemonSet. Reducing node count through better consolidation directly cuts this cost.

Two practical improvements: first, audit whether every DaemonSet in your cluster actually needs to run on every node. Monitoring agents often do not need to run on ephemeral spot nodes. Storage drivers only matter on nodes that mount those volumes. Use node selectors or tolerations to scope DaemonSets to the nodes that need them. Second, factor DaemonSet overhead into your resource requests when planning capacity. If you ignore it, Karpenter will select node sizes that appear sufficient but leave less headroom than expected.


4. Adopt Graviton Incrementally

AWS Graviton (ARM-based) instances offer up to 20% cost savings compared to equivalent x86 instances, with comparable or better performance for most workloads. For many teams, this is the highest-leverage cost change available without any architectural rework.

The main obstacle is image compatibility. Most container images are built for x86 by default. Running an x86 image on an ARM node either fails or triggers an emulation warning. The solution is multi-architecture images, which is less complicated than it sounds.

Docker Buildx makes this a single build command:

docker buildx build –platform linux/amd64,linux/arm64 -t registry/app:latest –push

When you push a multi-arch image to ECR (or most other registries), the registry stores architecture-specific manifests under a single tag. When a node pulls the image, it automatically receives the build that matches its architecture. You do not need separate tags or separate pipelines, just one build step that targets both platforms.

A safe rollout approach: start by adding a weighted ARM node pool alongside your existing pools. Use taints and tolerations to route compatible workloads there initially. Once you have confirmed stability, gradually shift the weight toward ARM. This mirrors a standard canary deployment, applied to node architecture rather than application code.


5. Reduce Cross-AZ Traffic Costs

Cross-availability-zone data transfer is a cost that builds up steadily. Most teams focus on cross-region traffic (which is expensive and visible) but pay less attention to cross-AZ traffic within the same region, which can add up just as fast for data-intensive workloads.

The goal is not to eliminate multi-AZ deployments. Running across multiple AZs is important for resilience. The goal is to be intentional about placement rather than letting pods land wherever the scheduler puts them.

Two useful techniques:

  • For tightly-coupled services (a backend and its database, for example), use topology spread constraints or pod affinity rules to prefer co-placement in the same AZ.
  • Use affinity weights rather than hard rules, so Karpenter can still schedule across AZs when needed for capacity, but will prefer co-location when possible.

A practical note on AZ naming: AWS maps AZ letters (us-east-1a, us-east-1b, etc.) differently across accounts. The zone you call “us-east-1a” is not the same physical location as your colleague’s “us-east-1a.” Use AZ IDs (use1-az1, use1-az2) when you need to coordinate placement across account boundaries.


6. Set Capacity Limits on Node Pools

Karpenter responds fast, which is one of its strengths. It can provision new nodes within seconds of a pending pod. That responsiveness becomes a liability if something goes wrong: a misconfigured resource request, a memory leak, or a sudden spike can cause rapid uncontrolled scaling before anyone notices.

Node pool limits cap the total compute that Karpenter can provision for a given pool:


  

  limits:
  cpu: "1000"
  memory: 1000Gi

Think of these the way you think about pod resource limits: not as a constraint on normal operation, but as a safety boundary that prevents runaway scaling. Set limits that reflect your actual capacity needs with reasonable headroom, and treat a limit hit as an alert worth investigating rather than something to raise reflexively.


Where Karpenter Optimization Runs Out

The techniques above can meaningfully reduce your infrastructure spend. But after you have applied them, most teams hit a ceiling that node-level optimization alone cannot break through.

The gaps tend to be consistent:

  • Pod resource requests are usually set at deployment time and rarely revisited. If your requests do not reflect actual usage, Karpenter is making placement decisions based on stale data, and you are either over-provisioning or running with insufficient headroom.
  • Consolidation is reactive. It cleans up waste after the fact but does not prevent it from forming in the first place.
  • Cloud commitments (Reserved Instances, Savings Plans) require forecasting usage accurately. Most teams manage these manually, which means they are either under-committed (leaving savings on the table) or over-committed (paying for capacity they no longer use).
  • Karpenter does not manage your horizontal scaling policy. HPA and KEDA configuration lives separately, and mismatches between scaling thresholds and actual demand are common.

Closing these gaps requires visibility into real workload behavior and the ability to adjust requests, replicas, and commitments continuously, not just at deploy time.


Going Further with Zesty

Zesty is built to close the gaps described above. The platform continuously rightsizes pod resource requests based on actual usage, aligns cloud commitments with real demand, and accelerates cold starts for spot instances through pre-cached, hibernated nodes (FastScaler), so workloads that were previously too slow to start on spot can now benefit from spot pricing.

If you want to see how this looks in your specific environment, schedule a call.