1. Why resource management matters more than most engineers realize
Kubernetes treats CPU and memory as first-class scheduling resources. That means everything from which node a pod lands on to how it behaves under pressure depends on one thing: the resource values you define. Set them too low and you introduce throttling or OOMKills. Set them too high and you waste capacity or block the scheduler from placing workloads efficiently.
The official documentation is very explicit: Kubernetes schedules pods based on requests, not limits.
This alone has major operational implications, and it is the foundation of everything you will configure next.
What you will learn in this section
- How resource settings influence scheduling
- What happens when usage exceeds limits
- How Kubernetes chooses Quality of Service classes
Once you understand these mechanics, the following steps will feel significantly clearer.
2. Prerequisites before you start
To follow this guide, you need:
- kubectl with access to your cluster
- metrics-server installed and running
- A workload you can inspect and modify
- Enough permissions to edit Deployments or Pod templates
Verification step
kubectl get --raw /apis/metrics.k8s.io/
You should see a JSON response. This confirms that metrics-server is registered and responding, just as required by the official “Monitoring” section of the documentation.
Next, we move into the mechanics that determine pod placement.
3. Understand what requests and limits actually do
Requests and limits are part of the PodSpec’s resources field, defined by the Kubernetes API.
How requests work
The official docs state:
- Requests represent the minimum amount of CPU or memory Kubernetes guarantees to the container.
- The scheduler uses requests to decide which node can host the pod.
- A pod cannot be scheduled onto a node unless the node has enough allocatable resources to satisfy all requests.
Put simply:
Requests determine where your pod can run.
How limits work
Limits define the maximum amount of CPU or memory a container is allowed to use.
- If a container tries to use more CPU than its limit, the Linux kernel throttles it.
- If it tries to use more memory than its limit, the container is terminated with an OOMKill.
This behavior is described in the official “Resource Management” documentation and aligns with cgroup enforcement.
Put simply:
Limits determine how your pod behaves under load.
Checkpoint
You should now know that requests influence scheduling and limits influence runtime behavior. Next, we explore how Kubernetes evaluates pod quality.
4. Learn how Kubernetes assigns QoS (Quality of Service)
Kubernetes uses requests and limits to determine each pod’s QoS class. According to the official “Pod QoS Classes” documentation, there are three classes:
Guaranteed
A pod is Guaranteed if:
- Every container has memory requests equal to limits
- Every container has CPU requests equal to limits
Guaranteed pods receive the strongest protection from eviction.
Burstable
A pod is Burstable if:
- At least one container has requests
- At least one request does not match its limit
These pods are less protected but still favored over BestEffort.
BestEffort
A pod is BestEffort if:
- No container has any requests or limits set
These are the first to be evicted under memory pressure.
Why this matters operationally
If you do not deliberately assign requests and limits, Kubernetes will treat your workload as BestEffort. The official documentation states that BestEffort pods are the most likely to be evicted. This is a major cause of unexpected production instability.
Next, we move into the practical workflow for choosing resource values.
5. A step-by-step workflow to determine correct resource settings
This workflow is designed to align with the official guidance to measure actual usage, not guess.
Step 1: Observe real usage
Use metrics-server or another metrics backend.
kubectl top pod <pod-name> --containers
You should see CPU cores in millicores and memory in bytes.
The official docs emphasize that CPU usage is averaged over a brief window and should not be treated as instantaneous.
Step 2: Compare usage to your current requests
Use:
kubectl describe pod <pod-name>
Look for the “Requests” and “Limits” section under each container.
Checkpoint
If real usage consistently exceeds requests, the scheduler may have under-provisioned the node. If usage is below requests, you may be wasting capacity.
Step 3: Choose an appropriate request value
The documentation states that requests should reflect the minimum resources needed for reliable performance.
In practice:
- Set CPU requests near typical runtime usage
- Set memory requests slightly above the typical high-water mark, due to memory not compressing or bursting like CPU
Step 4: Decide whether limits are necessary
The official docs explain that limits prevent runaway usage but may cause throttling or OOM kills.
A common safe pattern is:
- CPU: use limits only when needed
- Memory: always set limits, since out-of-memory conditions are disruptive cluster-wide
Step 5: Apply updated requests and limits and observe behavior
After updating the Deployment:
kubectl rollout status deployment/<name>
Then observe:
kubectl top pod
Repeat until usage aligns with expectations.
Next, we look at the main operational pitfalls engineers encounter.
6. Common misconfigurations and how to avoid them
Every issue described here is recognized in the official Kubernetes documentation.
Pitfall 1: Setting requests far higher than real usage
This causes poor scheduling because the scheduler reserves unnecessary capacity.
Pitfall 2: Setting memory limits too low
Memory cannot be throttled. A container that hits its memory limit will be killed.
This behavior is explicitly described in the cgroup enforcement section.
Pitfall 3: Setting CPU limits too low
The kernel will throttle CPU-intensive containers, impacting latency and throughput.
Pitfall 4: Leaving requests unset
This places the pod in BestEffort, making it the first to be evicted under pressure.
Pitfall 5: Forgetting that the scheduler only looks at requests, not limits
Over-sized requests cause node underutilization.
You now know how to avoid these issues. Next, we verify success.
7. Validate that resource settings are working correctly
Use three checks, all based on official tooling guidance.
Check 1: Observe resource usage
kubectl top pod
Check 2: Inspect container restarts and OOMKills
kubectl describe pod
Check 3: Confirm QoS class
Look under “QoS Class” inside kubectl describe pod.
A stable workload should:
- Not be repeatedly throttled
- Not show OOM events
- Have predictable scheduling and performance
Next, we look at how this ties into broader engineering practice.
8. What to do next
Once you have consistent resource values, you can explore topics the official documentation recommends pairing with requests and limits:
- Horizontal Pod Autoscaler (HPA)
- Vertical Pod Autoscaler (VPA)
- Pod disruption budgets
- Node allocatable resource planning
- Resource quotas and limits per namespace
These topics build directly on correct resource sizing.
