Over the past 10 years, I’ve deployed enough workloads on Kubernetes to know that “it runs” doesn’t mean “it’s ready.” A production-ready setup is something you build carefully. It’s what separates a weekend side project from a system that powers customer-facing services at scale.
This guide walks through the key configurations you should apply to every production workload. These are lessons pulled directly from real-world outages, scaling issues, and rollout mishaps I’ve seen or fixed.
We’ll cover:
- Pod Disruption Budgets (PDBs)
- Readiness and startup probes
- Horizontal Pod Autoscaler (HPA)
- Topology spread constraints
- Priority classes
- Graceful termination
Let’s walk through each one with examples and practical guidance you can apply right away.
Ensure App Availability with Pod Disruption Budgets (PDBs)
When nodes are upgraded or drained, Kubernetes may evict pods to move them elsewhere. Without limits, it might evict too many at once—leaving your service partially or fully offline.
Example
Imagine a deployment with 5 replicas spread across two nodes. A rolling node upgrade begins, draining the first node. Without a PDB, Kubernetes may evict all pods from that node simultaneously, potentially taking out 3 of your 5 replicas.
How to configure a PDB
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 3
selector:
matchLabels:
app: my-api
This config ensures at least 3 pods always stay available during disruptions.
📖 Kubernetes PDB documentation
Use Startup and Readiness Probes to Protect Rollouts
Two common sources of trouble in production:
- Your app takes time to boot up, but the platform gives up too early.
- A pod starts but isn’t ready to serve traffic, and yet Kubernetes thinks it is.
Here’s how probes help:
- Startup probe: Prevents load balancers from sending requests to the pod until it has fully booted. It’s especially useful for applications that require initialization time. Once the startup probe succeeds, Kubernetes begins running the readiness and liveness probes.
- Readiness probe: Runs continuously after startup. If the container becomes temporarily unhealthy, this probe ensures it is removed from service until it recovers.
- For example, if your startup or readiness probe has
periodSeconds: 10
, Kubernetes will only check the pod’s health every 10 seconds. That can delay the point at which the pod begins serving traffic—problematic for services that need rapid scale-outs.
Example configuration
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 2
startupProbe:
httpGet:
path: /startup
port: 8080
failureThreshold: 30
periodSeconds: 2
This setup gives the container 5 minutes to start, and prevents it from receiving traffic until the readiness check passes.
📖 Readiness/startup probe guide
Let Kubernetes Handle Load Spikes with HPA
Manually scaling your deployment doesn’t work in a fast-moving environment. The Horizontal Pod Autoscaler (HPA) reacts to CPU, memory, or custom metrics and adjusts replica counts automatically.
Common use case
A team running a public API often sees CPU spikes in the morning. Instead of manually scaling up, they configure HPA to keep usage under control and ensure consistent performance.
Sample HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Make sure your pods request resources explicitly, or HPA won’t have a baseline to work from.
Prevent Single-Zone Failures with Topology Spread Constraints
Sometimes, all your pods end up scheduled on the same node or AZ, even if there are others available. If that node or zone goes down, your app disappears with it.
What you can do
Topology spread constraints let you define how evenly pods should be distributed across zones, racks, or nodes. This avoids putting all eggs in one basket.
Example configuration
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: my-api
Use labels that align with your cloud provider’s zone information. This ensures a balanced and resilient spread.
Topology spread guide
Prioritize Critical Services with PriorityClasses
Kubernetes doesn’t always have enough room to schedule new pods—especially in busy clusters. By default, all pods are treated equally. This means that even if you’re trying to scale a high-priority workload, it might get blocked because background jobs or batch processing pods are using up all the available space.
To solve this, Kubernetes offers PriorityClasses—a mechanism to mark some workloads as more important than others. If there isn’t enough space to schedule a high-priority pod, Kubernetes will evict lower-priority pods to make room.
This is essential for reducing application scaling latency, especially during sudden surges in traffic where quick pod scheduling is key to maintaining responsiveness.
Assigning proper priority levels
The example below uses a very high value (100000), which marks the pod as extremely critical. While that’s useful in eviction scenarios, it can indirectly affect scaling responsiveness if combined with long probe intervals.
To minimize delays:
- Keep
periodSeconds
for readiness and startup probes in the 2–5 second range for fast-checking workloads. - Adjust
initialDelaySeconds
andfailureThreshold
based on your app’s boot time and error tolerance.
Use PriorityClasses to define eviction and scheduling policies, but don’t forget to tune your probes to support responsive scaling.
Example PriorityClass YAML
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 100000
preemptionPolicy: PreemptLowerPriority
globalDefault: false
Assign this priority class to important workloads in their pod spec:
spec:
priorityClassName: high-priority
Handle Shutdowns Cleanly with Graceful Termination
If Kubernetes shuts down a pod (due to scaling down, restart, etc.), it sends a SIGTERM
signal and waits a fixed period before force killing the container.
If your app doesn’t handle SIGTERM, it could terminate mid-request or while writing data.
Steps to implement
- Add a
terminationGracePeriodSeconds
to your pod spec. - Catch SIGTERM in your app code and begin shutdown.
- Ensure your readiness probe returns failure before shutdown starts, so no new traffic is routed to it.
Example YAML
spec:
terminationGracePeriodSeconds: 30
App-side example (Node.js)
process.on('SIGTERM', () => {
server.close(() => {
console.log('Server closed gracefully.');
process.exit(0);
});
});
Handle Shutdowns Cleanly with Graceful Termination
If Kubernetes shuts down a pod (due to scaling down, restart, etc.), it sends a SIGTERM
signal and waits a fixed period before force killing the container.
If your app doesn’t handle SIGTERM, it could terminate mid-request or while writing data.
Steps to implement
- Add a
terminationGracePeriodSeconds
to your pod spec. Remember that if this field is not configured, then the default of 30s is used. - Catch SIGTERM in your app code and begin shutdown.
- Ensure your readiness probe returns failure before shutdown starts, so no new traffic is routed to it.
Example YAML
spec:
terminationGracePeriodSeconds: 30
App-side example (Node.js)
process.on('SIGTERM', () => {
server.close(() => {
console.log('Server closed gracefully.');
process.exit(0);
});
});
Final Checklist Before You Go Live
Here’s a quick list of what I recommend before deploying a workload in production:
- Pod Disruption Budget (PDB) is defined to protect availability during voluntary disruptions (e.g., node upgrades).
- Readiness probe is configured to block traffic until the pod is actually ready.
- Startup probe is set for containers with slow boot times to avoid premature restarts.
- Horizontal Pod Autoscaler (HPA) is configured with reasonable min/max replica counts and resource targets.
- Topology spread constraints are applied to avoid placing all replicas in the same failure zone.
- PriorityClass is used to guarantee critical workloads aren’t evicted before lower-priority services.
- Graceful termination logic is implemented in app code, and
terminationGracePeriodSeconds
is set in pod specs. - These aren’t extras—they’re what I consider table stakes for serious workloads.
Further resources
- Pod lifecycle docs
- Production best practices from Learnk8s
- Autoscaling configuration tips
If you’re already doing most of this—great. If not, start small and incrementally bring your workloads up to standard. You’ll thank yourself later when things go wrong—and they eventually will.