Best practices for upgrading EKS clusters with Karpenter

Amazon Elastic Kubernetes Service (EKS) makes running Kubernetes a much smoother experience, but upgrading an EKS cluster can be anything but smooth. An upgrade touches both the control plane (the brain of your cluster) and the data plane (the worker nodes actually running workloads). Each component can break things if not handled properly.

Karpenter, the open-source cluster autoscaler, brings efficiency to node provisioning and scaling. But because it directly interacts with your workloads and AWS infrastructure, any EKS version change can affect how Karpenter behaves.

The reality: upgrading is unavoidable, but with the right plan you can do it without breaking production. In this guide, we’ll walk through the process end-to-end so you can upgrade smoothly and confidently.

Why you should upgrade your EKS cluster

Upgrading your cluster is not about chasing shiny features. It’s about avoiding risk and keeping your environment stable.

Security patches: Kubernetes releases regularly fix vulnerabilities. Falling behind means exposing workloads to known exploits.
Feature parity: APIs evolve. Some get deprecated. Others unlock new functionality. If you’re on an older version, you risk being blocked by unsupported APIs when deploying new workloads.
AWS support window: AWS only supports three EKS Kubernetes versions at a time. When your version falls out of support, you lose access to security patches and managed updates.
Performance gains: Upgrades often bring improvements in scheduling, scalability, and networking performance.

Delaying upgrades piles up technical debt. The further behind you fall, the riskier and more complex the eventual upgrade becomes.

Common issues during upgrades

Understanding the usual pain points will help you anticipate them.

Control plane vs. data plane: The control plane upgrades first, but worker nodes must be updated separately. Running old nodes with a new control plane can lead to unexpected behavior.
Version skew: Components like kubelet, kube-proxy, and the AWS VPC CNI plugin must remain compatible with the control plane. A mismatch is a classic cause of broken networking or pods stuck in Pending.
API deprecations: Each Kubernetes release removes APIs. For example, PodSecurityPolicy was deprecated in 1.21 and removed in 1.25. Workloads still using it will fail on upgrade.
Karpenter quirks:
- Node provisioning logic may change between versions. If you haven’t validated Karpenter compatibility, it may launch nodes incorrectly.
- When nodes recycle, running workloads can be evicted. Without disruption budgets, you risk downtime.

Knowing these pitfalls upfront is the best way to avoid firefighting later. Next, let’s map out how to prepare properly.

Planning your upgrade

Preparation separates smooth upgrades from war stories.

Pre-upgrade checklist

Audit your cluster for deprecated APIs:

kubectl get --raw /metrics | grep deprecated

Review workloads and Helm charts for incompatible resources.
Validate operators and controllers against the target Kubernetes version. Check release notes from vendors.
Confirm Karpenter’s supported versions. The Karpenter docs clearly state compatibility with Kubernetes releases.

Staging environment

Mirror production as closely as possible. Run the upgrade process there first to find breakages without risking real workloads.

Backup strategy

Ensure you have etcd snapshots if using self-managed control planes.
Export cluster manifests and custom resources so you can recover quickly.

Capacity planning

Plan for temporary double usage. When old nodes are drained, new nodes spin up to replace them, meaning you need enough AWS quota and budget for overlap.

Once these basics are covered, you’re ready to start the upgrade with confidence.

Step by step: executing the upgrade with minimal disruption

Here’s the process that experienced practitioners follow to keep systems online.

Step 1: Upgrade the control plane

Use the AWS console, CLI, or IaC (Terraform, CDK, etc.) to bump the cluster version.

Verify the new control plane version:

aws eks describe-cluster --name my-cluster --query cluster.version

Step 2: Upgrade node groups

For managed node groups: trigger an update through the console or CLI. AWS handles draining and replacement.

For self-managed nodes: do a rolling update. Drain each node gracefully:

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

Step 3: Handle Karpenter provisioners

Update Karpenter manifests and controller version to match cluster compatibility.
Configure PodDisruptionBudgets (PDBs) so workloads are not evicted all at once.
Use topology spread constraints so pods distribute evenly across nodes and AZs.
Pre-scale with Karpenter: set desired replicas higher before draining, so capacity is available when nodes go offline.

Step 4: Upgrade networking and addons

Update CNI, CoreDNS, and kube-proxy using eksctl or AWS console.

Confirm pod networking works by deploying a simple busybox pod and testing DNS resolution:

kubectl run test-dns --image=busybox:1.28 --rm -it --restart=Never -- nslookup kubernetes.default

Tips for near-zero downtime

Always enforce PodDisruptionBudgets.
Leverage multiple availability zones so draining nodes in one AZ doesn’t knock out capacity.

Once your nodes are upgraded and workloads stable, the next step is validation.

Post-upgrade validation and testing

Don’t assume the cluster is healthy just because the upgrade finished.

Cluster health checks

Confirm control plane components are ready:

kubectl get componentstatuses

Ensure all nodes are Ready and autoscaling responds correctly.

Application smoke tests

Run automated checks for your most critical services.
Test ingress endpoints, service discovery, and DNS lookups.

Karpenter validation

Scale workloads up and down and confirm Karpenter provisions and deprovisions nodes as expected.

Check logs for provisioning errors:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter

Observability

Validate monitoring dashboards and alerting rules.
Verify log pipelines are collecting from new nodes.

This is the stage where small misconfigurations show up, so give it real attention.

Best practices and lessons learned to build a sustainable upgrade flow

After a few cycles, you’ll notice patterns. These practices will save you time and headaches:

Keep Karpenter, CNI, CoreDNS, and kube-proxy aligned with cluster upgrades.
Maintain a documented runbook so upgrades don’t depend on tribal knowledge.
Schedule upgrades proactively, not under pressure when AWS deprecates a version.

Every upgrade is an opportunity to refine your process. Share lessons with your team so everyone benefits.

Upgrading without turning into a war story

Upgrading EKS clusters is a reality of operating Kubernetes in production. With careful planning, staged testing, and thorough validation, you can perform upgrades with minimal disruption.

Stay ahead of AWS deprecations, treat Karpenter as a first-class citizen in the upgrade, and keep your playbooks current. Done right, upgrades become routine rather than dreaded fire drills.

Kubernetes Resource Optimization

Spike Protection

Cloud Commitment Optimization

What's new

Get to know Zesty

Hear it from out Customers

Learn Kubernetes

Industry learning

Platform learning

Platform support

Podcast

Best practices for upgrading EKS clusters with Karpenter

Why you should upgrade your EKS cluster

Common issues during upgrades

Planning your upgrade

Step by step: executing the upgrade with minimal disruption

Post-upgrade validation and testing

Best practices and lessons learned to build a sustainable upgrade flow

Upgrading without turning into a war story

How load balancing strategy affects pod utilization and rightsizing accuracy

Tuning Karpenter for workloads with spiky traffic

Why stateful workloads are often the biggest scaling bottleneck in K8s

Using Karpenter and still overpaying?

How to avoid costly instance selection mistakes in Karpenter

Why “Accurate Requests” Still Lead to Cloud Resource Waste

Ready to Cut Cloud Costs?

Platform

Company

Resources

Proud to be

Kubernetes Resource Optimization

Spike Protection

Cloud Commitment Optimization

What's new

Get to know Zesty

Hear it from out Customers

Learn Kubernetes

Industry learning

Platform learning

Platform support

Podcast

Best practices for upgrading EKS clusters with Karpenter

Why you should upgrade your EKS cluster

Common issues during upgrades

Planning your upgrade

Step by step: executing the upgrade with minimal disruption

Post-upgrade validation and testing

Best practices and lessons learned to build a sustainable upgrade flow

Upgrading without turning into a war story

Check out related topics

How load balancing strategy affects pod utilization and rightsizing accuracy

Tuning Karpenter for workloads with spiky traffic

Why stateful workloads are often the biggest scaling bottleneck in K8s

Using Karpenter and still overpaying?

How to avoid costly instance selection mistakes in Karpenter

Why “Accurate Requests” Still Lead to Cloud Resource Waste

Ready to Cut Cloud Costs?

Platform

Company

Resources

Proud to be