HPA has become one of the most relied-upon tools in Kubernetes operations, mostly due to its ability to cover many common scaling event use cases. However, there are workloads even HPA can’t help.

A DaemonSet that claims resources on every node with no ability to scale.
A Postgres primary in a StatefulSet, which does not support multiple primary replicas.
A batch Job that spikes unpredictably and HPA has no bearing over.

These cases expose the limits of replica-based scaling, reminding us that Kubernetes has more workload patterns than HPA was ever meant to address. 

This article breaks down why HPA falls short in these scenarios and shows how to size and manage these workloads correctly for your cluster’s health and your own mental stability. 


Back to basics: what HPA is built to do

HPA scales replica count up or down based on resource utilization or custom metrics. It targets scalable workload controllers such as Deployments and StatefulSets.
It does not target things that cannot be horizontally scaled, like DaemonSets.

Prerequisites for any HPA work

Before concluding that HPA is failing, confirm the core dependencies:

  1. Metrics Server (or equivalent) is installed and healthy.
    HPA needs live CPU and memory metrics. Metrics Server is the default source for those.
  2. CPU and memory requests exist for all containers.
    HPA utilization is computed as a percentage of requested resources. If requests are missing or nonsense, scaling will be too.

Quick checks

kubectl get apiservices | grep metrics

kubectl top pods -A | head

Expected checkpoint

  • You see v1beta1.metrics.k8s.io available.
  • kubectl top returns numbers, not errors.

If these fail, fix them first. Otherwise you’re debugging a tool that’s flying blind.

Next: the happy path so we have a clear contrast.


Where HPA hits the nail on the head every single time

Consider a stateless web service like Booking.com. Baseline traffic sits comfortably on roughly 100 pods. When there is a surge, like a holiday spike or a promotion, CPU utilization jumps past a target, and HPA adds replicas quickly. When load drops, it scales back down. This is exactly the scenario HPA was designed for.

Why it works here

  1. Pods are interchangeable. Any replica can serve any user.
  2. Adding replicas increases real capacity. More pods equals more throughput.
  3. State does not need coordination. No leader election, no data sharding, no write ordering headaches.

If your service looks like this, HPA should be your first lever.

Next: a workload type where HPA is not even an option.


DaemonSets: replica scaling is unsupported by design

DaemonSets run one pod per node, always. Replica count is tied to node count, so there is no horizontal knob to turn. HPA does not scale DaemonSets.

What to do instead: vertical rightsizing

For DaemonSets, your lever is requests per pod.

Prerequisites

  • Metrics available for DaemonSet pods.
  • You know the namespace and DaemonSet labels.

Step-by-step

Measure real usage.

kubectl top pods -n <ns> -l app=<daemonset-label>

  1.  Checkpoint: you see actual CPU and memory per pod.

Fetch current requests.

kubectl get ds <name> -n <ns> \

  -o jsonpath='{.spec.template.spec.containers[*].resources.requests}'

echo

  1.  Checkpoint: you get the current request values.
  2. Rightsize based on observed steady state plus headroom.
    • If pods consistently use far less than requests, lower requests.
    • If they hit limits or show throttling, raise requests.

VPA exists to automate this exact process: adjusting CPU and memory requests from historical usage so you avoid waste and instability.

Common pitfalls

  • Setting “just in case” requests. That locks a fixed cost into every node and creates the overprovisioning waste pattern.
  • Under-requesting critical agents. Many DaemonSets are node lifelines. Starve them and they get evicted first.
Spending too much time on pod optimization?
Free your time from monitoring and tuning pod requests by hand

Learn how Zesty adjusts pod resources across dynamic workloads automatically.

Next: StatefulSets. HPA can target them, but that does not mean you should.


StatefulSets and databases: supported by HPA, but often the wrong lever

HPA can scale StatefulSets. Kubernetes explicitly lists them as targets.
Your correction is spot on: the issue is not support, it’s suitability.

Take a Postgres-backed StatefulSet. During a CPU spike on the primary:

  • Adding a replica may not reduce the bottleneck.
  • Replica changes alter topology and can add data and coordination risk.
  • Many replicas are read-only and do not relieve write pressure.

What to do instead: decide if replicas actually relieve the bottleneck

Prerequisites

  • Clear understanding of your replication model.
  • Peak hour load profile.

Step-by-step

  1. Identify what is actually bottlenecked.
    • If the primary is hot and replicas cannot take the load type, scaling replicas won’t help.
  2. Confirm horizontal scaling is architecturally safe.
    Ask: “If I add a pod, do I get more throughput, or more coordination overhead?”
  3. Rightsize vertically when replication is not a capacity lever.
    This is the VPA-class solution again: set CPU and memory requests to what the database actually needs.

Common pitfalls

  • Treating “StatefulSet is supported” as “StatefulSet should be scaled.”
  • Autoscaling read replicas while the primary is the one suffering.

Next: some services cannot run replicas at all.


Singleton workloads: replica count is fixed at 1

Some services are logically single-instance. They cannot coordinate safely with a second copy. In that situation, HPA is irrelevant because replica count above 1 breaks correctness.

The common operator response is to guess a safe high request, like 5 or 10 CPUs. That keeps you stable at peak, but wastes money the rest of the day. Real demand is time-varying: some hours need 10 CPUs, some need 1.

What to do instead: adapt pod size to time-varying demand

Prerequisites

  • Historical CPU and memory for the singleton pod.
  • Visibility across multiple days.

Step-by-step

Collect usage over time.

kubectl top pod <pod> -n <ns> --containers

  1.  Then pull a week of history from Prometheus or your APM.
  2. Find steady state and peaks.
    Use percentiles, not averages. Averages hide spikes.
  3. Set requests to match reality.
    • Requests near steady state with headroom.
    • If peaks are frequent, raise requests.
    • If peaks are rare, consider scheduled vertical resizing.

VPA automates this kind of ongoing rightsizing based on history.

Common pitfalls

  • Leaving a “temporary high request” for months.
  • Treating singleton as a scheduling issue rather than a sizing issue.

Next: Jobs. Their lifecycle breaks HPA’s assumptions completely.


Jobs: bursty, short-lived, and not scalable by replicas

A Job wakes up, runs, and disappears. You don’t control exactly when it starts, how heavy it will be, or how long it runs. That bursty shape is why HPA is a mismatch. HPA assumes a continuously running workload where changing replica count changes throughput.

What to do instead: rightsize per job run using history

Prerequisites

  • Job pods retained long enough to inspect.
  • Access to historical metrics.

Step-by-step

List recent job pods.

kubectl get pods -n <ns> \

  --selector=job-name=<job> \

  --sort-by=.status.startTime

Inspect usage for several runs.

kubectl top pod <job-pod> -n <ns>

  1. Model spikes, not averages.
    Jobs often fail due to short peaks. Requests must protect the burst.
  2. Tune requests and re-check after changes.
    Any code shift or data volume shift warrants a re-measure.

Common pitfalls

  • Copying requests from always-on services.
  • Basing sizing on one run.

Next: compress this into a decision model you can use quickly.


A simple decision lens for HPA fit

Use this checklist whenever someone says, “Let’s just put HPA on it.”

HPA fits when

  1. Replicas are safe. Adding pods won’t break correctness.
  2. Replicas relieve the bottleneck. More pods equals more capacity.
  3. Workload runs continuously. Replica changes matter over time.

HPA does not fit when

  1. Replica count is fixed or meaningless. DaemonSets, singleton services.
  2. Replication is unsafe or pointless. Many database primaries, some StatefulSet apps.
  3. Workload is bursty and short-lived. Jobs.

Put another way: if the horizontal lever does not exist or does not help, go vertical.

Next: we close the main guide, then look at how Zesty ties this together.


Autoscaling works best when you don’t force it

When HPA is applied outside its comfort zone, you get real production fallout:

  • Cost waste from over-requesting.
  • Throttling and evictions from under-requesting.
  • Bigger node pools because the scheduler thinks the cluster is full.

A clean autoscaling strategy is straightforward:

  • Use HPA for stateless Deployments.
  • Use vertical rightsizing for DaemonSets, singleton services, and most Jobs.
  • Use HPA on StatefulSets only when replicas truly add capacity.

One more landmine to watch for. If you combine HPA and VPA naïvely, they can fight. VPA changes requests, and HPA computes utilization as a percentage of requests, so VPA can unintentionally force HPA to add replicas, creating a feedback loop.

If that sounds familiar and you want to see how Zesty coordinates HPA and VPA without conflict, and how we rightsize workloads HPA can’t scale, keep reading.


Read on: how Zesty makes HPA and VPA work together, and covers the end cases

Zesty’s Pod Rightsizing: rightsizing for savings and stability

Vertical rightsizing matters for two reasons:

  • Cost reduction: Fix overprovisioning. When pods request far more CPU than they use, the scheduler packs poorly and you end up paying for nodes you did not need.
  • Stability: Fix underprovisioning. When pods request too little, they get throttled or evicted under pressure.

Zesty’s Pod Rightsizing focuses on continuously aligning requests to real-time usage so workloads get what they need, no more and no less.

Solving VPA–HPA conflict in production terms

The conflict is classic:

  1. HPA scales replicas quickly during a surge.
  2. A naïve VPA sees slack and reduces per-pod requests.
  3. Utilization percentage rises, so HPA scales even more.
  4. Replica counts creep or flap.

The ecosystem advice is clear: don’t run both without guardrails.

Zesty’s approach follows the control-loop reality:

  1. Let HPA respond first. HPA reacts fast.
  2. Wait for HPA to settle. Zesty rightsizes after stability is observed.
  3. Rightsize within HPA policy boundaries. That avoids triggering another horizontal wave.
  4. Align thresholds when needed. Zesty can tune HPA thresholds so horizontal decisions remain consistent with updated requests.

Checkpoint: After enabling Zesty alongside HPA, you should see fewer replica oscillations and a tighter gap between requested and used resources.

Handling workloads HPA cannot help

For the end cases in this article, vertical rightsizing is the primary lever:

  • DaemonSets: One per node means the only safe optimization is per-pod sizing.
  • Stateful database workloads: Rightsize pods without forcing unsafe replication.
  • Singleton services: Adapt requests to real, hour-by-hour needs.
  • Jobs: Size sporadic runs based on history, including complex job patterns like the Mobileye example raised in the discussion.

The outcome is the same across all four: cost reduction plus stability, even where HPA is irrelevant or harmful.


Next steps and deeper resources

If you want to go further, these are worth bookmarking: