Karpenter has become the autoscaler of choice for many Kubernetes platform teams, and for good reason.

It provisions nodes in real time, selects efficient instance types, consolidates underutilized capacity, and removes much of the operational burden associated with traditional node groups.

On paper, the value proposition is compelling: the right nodes, at the right time, for the right price.

Many teams expect that adopting Karpenter will significantly reduce their infrastructure costs, but oftentimes, the cloud bill barely changes.

If you’ve automated scaling but haven’t seen meaningful savings, the issue usually isn’t Karpenter itself. More often, it is a combination of request inflation, bin-packing inefficiencies, and configuration friction that quietly erode the expected benefits.

Let’s walk through where the money actually goes and how to optimize it.


First things first: Is Karpenter actually saving you money?

Before diving into root causes, it helps to check a few simple signals.

Signs your cluster is optimized

  • Node CPU utilization typically above ~50%
  • Node counts drop during off-peak hours
  • Spot usage increases after migrating to Karpenter
  • Frequent node consolidation events

Signs savings are limited

  • Nodes show very high allocation but low actual usage
  • Node counts remain similar day and night
  • Many large instances with low CPU utilization
  • Consolidation rarely occurs

If you observe the second pattern, the causes below are usually responsible.


Karpenter’s strength and blind spot

Karpenter is excellent at optimizing infrastructure supply.

It reacts quickly to pending pods, selects efficient instance types, and consolidates nodes when possible.

However, Karpenter is intentionally literal.

It provisions capacity based on what Kubernetes workloads request, not what they actually use.

This design is powerful, but it exposes a costly reality in many clusters.

If workloads request more than they need, Karpenter will faithfully provision infrastructure to satisfy those requests.


The “Request vs. Reality” gap

One of the most common sources of waste is simple: workloads ask for far more resources than they consume.

If a service requests 2 vCPUs but consistently uses 0.2, Karpenter will still provision capacity for the full request.

From the scheduler’s perspective, the node is full.
From the cloud provider’s perspective, the instance is mostly idle.

How to detect it

Compare requested resources with actual usage.

Common signals include:

Metrics:


  kube_pod_container_resource_requests
container_cpu_usage_seconds_total

Example approach:

Compare total requested CPU with real CPU usage over time.


  sum(rate(container_cpu_usage_seconds_total[5m]))
vs
sum(kube_pod_container_resource_requests{resource="cpu"})

Classic waste signal

  • Nodes show ~90–95% allocated
  • Instance CPU utilization sits around ~10–20%

This is ghost capacity, and it is one of the biggest hidden cost drivers in Kubernetes environments.

Overpaying for CPU and memory you don’t use?
Rightsize pods to reduce spend without risking performance

Learn how Zesty automatically aligns resources with real-time demand while preventing CPU throttling and OOM issues.


Jagged headroom and bin-packing losses

Even with accurate requests, inefficient sizing can create stranded capacity.

Karpenter tries to solve a complex bin-packing problem, but real workloads often produce what we call jagged headroom: small unusable gaps that accumulate across the cluster.

Example

Node capacity:


  16 GB memory

Scheduled workloads:


  Pod A: 7 GB
Pod B: 7 GB

Remaining memory:


  2 GB

If the next workload requests 3 GB, the scheduler cannot place it on this node.
A new node must be created even though free capacity still exists.

Across many nodes, these gaps accumulate and increase cluster size.

What helps

Standardizing workload “T-shirt sizes” significantly improves packing efficiency.

Example sizes:


  1 GB
2 GB
4 GB
8 GB

Consistent sizing reduces fragmentation and improves node utilization.


Over-restrictive NodePools (“Legacy Thinking”)

Another common issue is overly tight instance constraints.

Karpenter’s power comes from its ability to choose across hundreds of instance types.

However, many teams unintentionally limit this flexibility by carrying over rules from Auto Scaling Group days.

Waste signals

  • Restricting to specific instance families (for example only m5 or c5)
  • Avoiding newer generations
  • Limited Spot diversification

The opportunity

Broadening NodePool requirements allows Karpenter to:

  • Access better price-performance ratios
  • Increase Spot availability
  • Improve consolidation outcomes

In many environments, simply relaxing these constraints unlocks immediate savings.


The hidden EBS “tax”

Compute is not the only place where costs accumulate.

Storage attached to nodes is often quietly oversized.

Every node Karpenter launches includes a root volume. If the default image uses a 100 GB disk but workloads require far less, the cluster pays for unused storage on every node.

Example math


  100 nodes × 100 GB = 10 TB provisioned

Actual usage may only be a small fraction of that.

Practical optimizations

  • Right-size the root volume baseline
  • Prefer gp3 over gp2
  • Monitor image and runtime storage growth before increasing disk

This is frequently overlooked and can become a meaningful cost driver.


Is your cluster actually elastic?

One of the simplest ways to validate optimization is the off-peak test.

Check node counts during low-demand windows such as overnight.

If capacity remains high when workloads drop, something is preventing consolidation.

Common blockers include:

  • Pod Disruption Budgets (PDBs)
  • Local storage dependencies
  • Overuse of protection labels
  • Conservative consolidation policies

A particularly important setting is:


  consolidationPolicy: WhenEmptyOrUnderutilized

Without it, nodes are only replaced when completely empty, which limits potential savings.


Where workload intelligence helps

At this point many teams discover an important pattern.

Karpenter optimizes infrastructure supply, but most waste originates from workload demand.

Requests, storage allocation, and scaling buffers are often configured statically. Over time they drift far from real usage patterns.

Closing this gap requires systems that continuously observe workload behavior and adjust infrastructure decisions accordingly.

Examples of capabilities include:

  • automatic pod rightsizing
  • dynamic storage allocation
  • reduced scaling headroom while maintaining fast reaction times

Workload-aware platforms such as Kompass by Zesty introduce this layer on top of Kubernetes.


How Zesty complements Karpenter

When combined with Karpenter, Zesty aligns workload demand with infrastructure supply.

Pod Rightsizing

Zesty continuously analyzes real-time usage and automatically adjusts pod requests.

Impact:

  • Eliminates request inflation
  • Forces smaller and cheaper nodes
  • Improves bin-packing efficiency
  • Requires no developer intervention

FastScaler

FastScaler maintains a pool of hibernated capacity that can reactivate in under 30 seconds.

This enables teams to operate with minimal headroom while maintaining rapid scale responsiveness.

Additional benefits:

  • Faster reaction than typical cold starts
  • Built-in Spot fallback safety
  • Increased confidence running critical workloads on Spot
  • Potential compute savings

The bottom line

Karpenter delivers infrastructure agility.

However, many Kubernetes cost inefficiencies originate above the infrastructure layer.

Request inflation, fragmented resource sizing, restrictive NodePools, and static storage allocations all contribute to wasted capacity.

When infrastructure optimization and workload intelligence operate together, clusters can achieve:

  • higher utilization
  • improved reliability
  • meaningful and sustained cost reduction

If your cluster scales beautifully but your bill has not moved, the next place to look is not node provisioning. It is the workloads themselves.