Short overview

Kubernetes autoscaling, by default, relies on the Horizontal Pod Autoscaler (HPA) to adjust the number of pod replicas based on CPU and memory utilization. However, applications may have unique needs where standard metrics aren’t sufficient. Custom metrics allow users to leverage other data sources to control scaling, such as response time, request rate, or even external systems like databases or message queues. By defining custom metrics, users can align scaling behavior more closely with business logic or application performance requirements.

How Custom Metrics Work:

Custom metrics are gathered by integrating a Kubernetes cluster with a monitoring system (e.g., Prometheus) capable of scraping these metrics. Once the metrics are available, Kubernetes can access them using the Kubernetes Metrics API and apply them to autoscalers like the HPA. Here’s a brief flow:

  1. Define the Custom Metric: Identify a relevant metric for scaling, such as requests per second or queue length.
  2. Collect the Metric: Use monitoring systems to collect and expose the custom metric.
  3. Configure the HPA: Extend the Horizontal Pod Autoscaler configuration to include the custom metric with a target value that should trigger scaling.

Examples and Use Cases:

Here are some common custom metrics used in autoscaling, along with real-world scenarios where they can enhance application performance:

  1. HTTP Request Rate
    • Metric: Number of HTTP requests per second (RPS)
    • Use Case: A web app can scale based on the number of incoming requests. When RPS crosses a defined threshold (e.g., 500 RPS), Kubernetes scales up the number of web server pods to handle the load.
  2. Queue Length in a Message Broker
    • Metric: Number of pending jobs in a queue (e.g., RabbitMQ, Kafka)
    • Use Case: In background job processing, if the queue length exceeds 100 jobs, Kubernetes can add more worker pods to quickly process the pending jobs, preventing bottlenecks.
  3. Response Time (Latency)
    • Metric: Average response time (ms)
    • Use Case: For latency-sensitive apps, like payment gateways, Kubernetes can scale up pods when response times exceed 200ms, ensuring a smooth user experience.
  4. Cache Hit Ratio
    • Metric: Percentage of requests served from the cache
    • Use Case: Autoscale a Redis cluster based on cache performance. For example, if the hit ratio drops below 80%, Kubernetes can add more cache pods to improve response time.
  5. Database Connections
    • Metric: Number of active database connections
    • Use Case: In a database-heavy app, scaling may depend on database connection limits. If the connection usage nears 90% of the database capacity, the app can autoscale down to avoid overloading the system.
  6. Disk I/O
    • Metric: Disk read/write operations per second (IOPS)
    • Use Case: High disk I/O apps (e.g., data analytics) may need to scale based on I/O load. If IOPS exceeds 5000, Kubernetes can add more pods to keep up with demand.
  7. Error Rate
    • Metric: Number of error responses (HTTP 500/503 codes)
    • Use Case: For API services, if the error rate spikes to more than 50 errors per minute, Kubernetes can add pods to handle the load and reduce service failures.
  8. User Sessions or Active Users
    • Metric: Number of active user sessions
    • Use Case: In real-time apps (like gaming or chat apps), autoscale based on user load. If active user sessions exceed 1000, Kubernetes can spin up more pods to manage the traffic.
  9. Data Throughput (Network Traffic)
    • Metric: Network data throughput (e.g., MBps)
    • Use Case: For network-intensive apps, autoscale based on network usage. For instance, if network traffic exceeds 100 MBps, Kubernetes can increase pod replicas to keep the app responsive.

Challenges:

  • Monitoring Integration: Custom metrics require integration with external monitoring systems like Prometheus, adding complexity to setup and maintenance.
  • Threshold Setting: Determining appropriate thresholds for scaling can be difficult, requiring experimentation and ongoing tuning.
  • Performance Overhead: Collecting and processing custom metrics introduces additional system overhead, particularly if the metrics are granular or collected at high frequency.

Value Proposition:

  • Application-Specific Autoscaling: Custom metrics allow for autoscaling to be more tightly aligned with application behavior, resulting in better optimization and more efficient use of resources.
  • Improved Efficiency: By scaling based on business-critical metrics (such as request rate or user sessions), users can achieve more fine-grained and cost-effective scaling than by using CPU and memory alone.

Key Features:

  • Metric Flexibility: Custom metrics can reflect any aspect of an application’s performance, from external dependencies like databases to user-driven events.
  • Fine-tuned Control: Custom metrics offer more specific control over scaling behavior, allowing for better alignment with application KPIs.
  • Monitoring Integration: Kubernetes integrates with common monitoring systems (e.g., Prometheus) to collect and evaluate custom metrics in real-time.

References: