Over the past 10 years, I’ve deployed enough workloads on Kubernetes to know that “it runs” doesn’t mean “it’s ready.” A production-ready setup is something you build carefully. It’s what separates a weekend side project from a system that powers customer-facing services at scale.

This guide walks through the key configurations you should apply to every production workload. These are lessons pulled directly from real-world outages, scaling issues, and rollout mishaps I’ve seen or fixed.

We’ll cover:

  • Pod Disruption Budgets (PDBs)
  • Readiness and startup probes
  • Horizontal Pod Autoscaler (HPA)
  • Topology spread constraints
  • Priority classes
  • Graceful termination

Let’s walk through each one with examples and practical guidance you can apply right away.

Ensure App Availability with Pod Disruption Budgets (PDBs)

When nodes are upgraded or drained, Kubernetes may evict pods to move them elsewhere. Without limits, it might evict too many at once—leaving your service partially or fully offline.

Example

Imagine a deployment with 5 replicas spread across two nodes. A rolling node upgrade begins, draining the first node. Without a PDB, Kubernetes may evict all pods from that node simultaneously, potentially taking out 3 of your 5 replicas.

How to configure a PDB


  apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: my-api

This config ensures at least 3 pods always stay available during disruptions.

📖 Kubernetes PDB documentation

Use Startup and Readiness Probes to Protect Rollouts

Two common sources of trouble in production:

  1. Your app takes time to boot up, but the platform gives up too early.
  2. A pod starts but isn’t ready to serve traffic, and yet Kubernetes thinks it is.

Here’s how probes help:

  • Startup probe: Prevents load balancers from sending requests to the pod until it has fully booted. It’s especially useful for applications that require initialization time. Once the startup probe succeeds, Kubernetes begins running the readiness and liveness probes.
  • Readiness probe: Runs continuously after startup. If the container becomes temporarily unhealthy, this probe ensures it is removed from service until it recovers.
  • For example, if your startup or readiness probe has periodSeconds: 10, Kubernetes will only check the pod’s health every 10 seconds. That can delay the point at which the pod begins serving traffic—problematic for services that need rapid scale-outs.

Example configuration


  readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 2

startupProbe:
  httpGet:
    path: /startup
    port: 8080
  failureThreshold: 30
  periodSeconds: 2

This setup gives the container 5 minutes to start, and prevents it from receiving traffic until the readiness check passes.

📖 Readiness/startup probe guide

Let Kubernetes Handle Load Spikes with HPA

Manually scaling your deployment doesn’t work in a fast-moving environment. The Horizontal Pod Autoscaler (HPA) reacts to CPU, memory, or custom metrics and adjusts replica counts automatically.

Common use case

A team running a public API often sees CPU spikes in the morning. Instead of manually scaling up, they configure HPA to keep usage under control and ensure consistent performance.

Sample HPA configuration


  apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Make sure your pods request resources explicitly, or HPA won’t have a baseline to work from.

HPA v2 documentation

Prevent Single-Zone Failures with Topology Spread Constraints

Sometimes, all your pods end up scheduled on the same node or AZ, even if there are others available. If that node or zone goes down, your app disappears with it.

What you can do

Topology spread constraints let you define how evenly pods should be distributed across zones, racks, or nodes. This avoids putting all eggs in one basket.

Example configuration


  spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app: my-api

Use labels that align with your cloud provider’s zone information. This ensures a balanced and resilient spread.

Topology spread guide

Prioritize Critical Services with PriorityClasses

Kubernetes doesn’t always have enough room to schedule new pods—especially in busy clusters. By default, all pods are treated equally. This means that even if you’re trying to scale a high-priority workload, it might get blocked because background jobs or batch processing pods are using up all the available space.

To solve this, Kubernetes offers PriorityClasses—a mechanism to mark some workloads as more important than others. If there isn’t enough space to schedule a high-priority pod, Kubernetes will evict lower-priority pods to make room.

This is essential for reducing application scaling latency, especially during sudden surges in traffic where quick pod scheduling is key to maintaining responsiveness.

Assigning proper priority levels

The example below uses a very high value (100000), which marks the pod as extremely critical. While that’s useful in eviction scenarios, it can indirectly affect scaling responsiveness if combined with long probe intervals.

To minimize delays:

  • Keep periodSeconds for readiness and startup probes in the 2–5 second range for fast-checking workloads.
  • Adjust initialDelaySeconds and failureThreshold based on your app’s boot time and error tolerance.

Use PriorityClasses to define eviction and scheduling policies, but don’t forget to tune your probes to support responsive scaling.

Example PriorityClass YAML


  apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 100000
preemptionPolicy: PreemptLowerPriority
globalDefault: false

Assign this priority class to important workloads in their pod spec:


  spec:
  priorityClassName: high-priority

PriorityClasses documentation

Handle Shutdowns Cleanly with Graceful Termination

If Kubernetes shuts down a pod (due to scaling down, restart, etc.), it sends a SIGTERM signal and waits a fixed period before force killing the container.

If your app doesn’t handle SIGTERM, it could terminate mid-request or while writing data.

Steps to implement

  • Add a terminationGracePeriodSeconds to your pod spec.
  • Catch SIGTERM in your app code and begin shutdown.
  • Ensure your readiness probe returns failure before shutdown starts, so no new traffic is routed to it.

Example YAML


  spec:
  terminationGracePeriodSeconds: 30

App-side example (Node.js)


  process.on('SIGTERM', () => {
  server.close(() => {
    console.log('Server closed gracefully.');
    process.exit(0);
  });
});

Pod termination behavior

Handle Shutdowns Cleanly with Graceful Termination

If Kubernetes shuts down a pod (due to scaling down, restart, etc.), it sends a SIGTERM signal and waits a fixed period before force killing the container.

If your app doesn’t handle SIGTERM, it could terminate mid-request or while writing data.

Steps to implement

  • Add a terminationGracePeriodSeconds to your pod spec. Remember that if this field is not configured, then the default of 30s is used.
  • Catch SIGTERM in your app code and begin shutdown.
  • Ensure your readiness probe returns failure before shutdown starts, so no new traffic is routed to it.

Example YAML


  spec:
  terminationGracePeriodSeconds: 30

App-side example (Node.js)


  process.on('SIGTERM', () => {
  server.close(() => {
    console.log('Server closed gracefully.');
    process.exit(0);
  });
});

Final Checklist Before You Go Live

Here’s a quick list of what I recommend before deploying a workload in production:

  • Pod Disruption Budget (PDB) is defined to protect availability during voluntary disruptions (e.g., node upgrades).
  • Readiness probe is configured to block traffic until the pod is actually ready.
  • Startup probe is set for containers with slow boot times to avoid premature restarts.
  • Horizontal Pod Autoscaler (HPA) is configured with reasonable min/max replica counts and resource targets.
  • Topology spread constraints are applied to avoid placing all replicas in the same failure zone.
  • PriorityClass is used to guarantee critical workloads aren’t evicted before lower-priority services.
  • Graceful termination logic is implemented in app code, and terminationGracePeriodSeconds is set in pod specs.
  • These aren’t extras—they’re what I consider table stakes for serious workloads.

Further resources

  • Pod lifecycle docs
  • Production best practices from Learnk8s
  • Autoscaling configuration tips

If you’re already doing most of this—great. If not, start small and incrementally bring your workloads up to standard. You’ll thank yourself later when things go wrong—and they eventually will.