Types of pod autoscaling
Autoscaling can occur in two dimensions:
- Horizontal Pod Autoscaling (HPA): Scales the number of pod replicas in response to workload demands.
- Vertical Pod Autoscaling (VPA): Adjusts the CPU and memory resources allocated to individual pods to match their needs.
Horizontal Pod Autoscaling (HPA)
HPA is a fundamental feature in Kubernetes that automatically scales the number of pod replicas based on observed metrics such as CPU utilization, memory usage, or even custom metrics. It is particularly useful in applications with varying levels of traffic or computational demand, allowing for dynamic scaling in and out.
How HPA Works
HPA operates by continuously monitoring the metrics of running pods and adjusting the number of replicas to match the desired utilization target. The HPA controller in Kubernetes compares the current usage against the target metrics defined by the user and calculates the required number of replicas.
Here’s a detailed step-by-step process of how HPA functions:
- Metrics Collection: Kubernetes gathers resource usage data from the pods via the Metrics Server or a custom metrics pipeline (such as Prometheus).
- Target Setting: You define a target utilization (e.g., 70% CPU usage) in your HPA configuration. The HPA controller uses this target to determine whether to scale the number of pods up or down.
- Calculation of Desired Replicas: The HPA controller uses the formula:Desired Replicas=Current UtilizationTarget Utilization×Current Number of Replicas\text{Desired Replicas} = \frac{\text{Current Utilization}}{\text{Target Utilization}} \times \text{Current Number of Replicas}Desired Replicas=Target UtilizationCurrent Utilization×Current Number of ReplicasFor example, if your current CPU usage is 140% of the target and you have 5 replicas running, the desired number of replicas would be 10.
- Scaling Decision: Based on this calculation, the HPA controller adjusts the number of replicas in the deployment. Kubernetes then either creates or deletes pods to achieve the desired state.
- Cool-Down Period: After scaling, Kubernetes enforces a cool-down period to prevent rapid scaling in response to temporary spikes in demand. This helps avoid unnecessary churn and ensures stability.
HPA Configuration Example
To configure HPA, you can use the kubectl autoscale
command or define an HPA resource in a YAML file. Here’s an example of configuring HPA for a deployment:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
In this example, the HPA will scale the number of pods in the my-app
deployment between 3 and 10, depending on whether the average CPU utilization exceeds or falls below 70%.
Advanced HPA with Custom Metrics
While HPA traditionally uses CPU and memory metrics, Kubernetes allows for more advanced scaling scenarios using custom metrics. This enables scaling based on application-specific metrics such as request latency, queue length, or even external metrics from monitoring systems like Prometheus.
To use custom metrics with HPA, you need to define the metrics in a custom metrics API and configure your HPA to use these metrics. Here’s a brief overview of how it works:
- Deploy Metrics Adapter: Install a metrics adapter like Prometheus Adapter to expose custom metrics to the Kubernetes API.
- Define Custom Metrics: Use your application’s instrumentation (e.g., Prometheus) to collect custom metrics that are relevant for scaling.
- Configure HPA for Custom Metrics: Modify your HPA configuration to use the custom metrics, specifying the metric name and target value.
Example:
yamlCopy codeapiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-custom-app
minReplicas: 2
maxReplicas: 15
metrics:
- type: Pods
pods:
metric:
name: request_duration_seconds
target:
type: AverageValue
averageValue: 300ms
This configuration scales the my-custom-app
deployment based on the request_duration_seconds
custom metric, ensuring that response times remain within acceptable limits.
Best Practices for HPA
- Start with Conservative Limits: Begin with conservative scaling limits to avoid rapid scale-ups or downs that could destabilize your application.
- Monitor Scaling Events: Regularly monitor HPA activity to understand scaling behavior and adjust targets or thresholds as needed.
- Use Custom Metrics Wisely: Only use custom metrics that are critical to your application’s performance, and ensure that the metrics are stable and reliable.
Vertical Pod Autoscaling (VPA)
Vertical Pod Autoscaling (VPA) is designed to automatically adjust the CPU and memory resources allocated to individual pods, optimizing resource utilization for workloads with varying resource demands. Instead of changing the number of pods, VPA modifies the resources within the existing pods, making it ideal for applications where the workload remains constant but resource requirements fluctuate.
How VPA Works
VPA continuously monitors the resource utilization of pods and compares it with the resource requests and limits defined in the pod specifications. Based on this comparison, VPA recommends or automatically adjusts the resources for each pod.
Here’s how VPA operates:
- Monitoring Resource Usage: VPA collects real-time data on CPU and memory usage from the pods, using the Kubernetes Metrics Server or custom metrics pipelines.
- Recommendation Engine: The VPA recommendation engine analyzes this data and suggests adjustments to the pod’s resource requests. These recommendations aim to balance resource usage with performance, avoiding over-provisioning or under-provisioning.
- Resource Adjustment: Depending on the VPA configuration, it can either suggest changes (which administrators can review and apply) or automatically update the resource requests in the pod’s specification. If necessary, the pod may be restarted to apply the new resource limits.
- Resource Update: The VPA controller then updates the pod specification with the new resource limits, and Kubernetes restarts the pod if needed to apply the changes.
VPA Configuration Example
Configuring VPA involves creating a VPA resource that specifies how the resources should be adjusted for a given deployment or stateful set. Here’s an example of a basic VPA configuration:
yamlCopy codeapiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
namespace: default
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
In this example, the VPA will automatically adjust the CPU and memory resources for pods in the my-app
deployment.
Best Practices for VPA
- Gradual Changes: Apply changes gradually to avoid drastic shifts in resource allocation that could disrupt the application’s performance.
- Monitor Recommendations: Regularly review VPA recommendations, especially in the initial stages, to ensure that they align with expected performance.
- Combine with HPA: For applications with both fluctuating traffic and varying resource demands, consider combining HPA and VPA to achieve both horizontal and vertical scaling.
Use cases for HPA and VPA
When deciding between HPA and VPA, consider the nature of your application’s workload:
- Use HPA: If your application experiences variable traffic patterns that require scaling in the number of instances, HPA is the appropriate choice. It’s ideal for front-end services, APIs, and other stateless applications that need to scale out to handle increased traffic.
- Use VPA: If your application has stable traffic but varying resource requirements (e.g., batch processing jobs, machine learning workloads), VPA is more suitable. It allows each pod to efficiently use its allocated resources, minimizing waste.
- Combine HPA and VPA: In some cases, the best approach is to use both HPA and VPA together. This hybrid model allows you to scale out when traffic increases while also optimizing resource usage within each pod.
Further resources
Kubernetes – Horizontal Pod Autoscaling
Github – Kubernetes Autoscaling