Why Monitoring in Kubernetes is Essential

Unlike traditional applications running on fixed servers, Kubernetes workloads are highly dynamic. Pods are constantly created and destroyed, scaling up and down as needed. This makes monitoring more complex but also more critical.

Without proper monitoring, you risk:

  • Unexpected outages – Pods crash, nodes fail, and services can break without warning.
  • Overspending on cloud resources – Without visibility into CPU, memory, and storage usage, costs can spiral out of control.
  • Difficult troubleshooting – Debugging issues in a distributed system without logs and metrics is like searching for a needle in a haystack.

A strong monitoring setup ensures you catch problems before they escalate while keeping costs optimized.

Key Metrics

Monitoring in Kubernetes is typically divided into three main types of data:

1. Metrics Performance Monitoring

These provide real-time insights into the health of your cluster.

Key Metrics to Track:

  • CPU and memory usage (per pod, node, and cluster level)
  • Network traffic (incoming and outgoing data)
  • Pod restarts and failures
  • Cluster resource allocation (how efficiently resources are used)

Example:
You can use kubectl to check CPU and memory usage:

kubectl top pod --all-namespaces

2. Logs (Application and System Logs)

Logs help you understand what happened when things go wrong.

Key Logs to Monitor:

  • Pod logs (container output, errors, stack traces)
  • Node logs (Kubelet, system messages)
  • Control plane logs (API server, scheduler, controller manager)

Example:
Check logs for a specific pod:

kubectl logs my-pod -n my-namespace

3. Events (Cluster-Wide Activity)

Events show what Kubernetes is doing behind the scenes.

Key Events to Monitor:

  • Pod scheduling failures (if a pod can’t find a node to run on)
  • OOMKills (Out-of-Memory Errors)
  • Network policies being applied or changed

Example:
See the most recent events in your cluster:

kubectl get events --sort-by=.metadata.creationTimestamp

Open-Source Tools for Monitoring

There are many open-source tools to monitor Kubernetes efficiently. These tools provide visibility into costs, workloads, and performance without vendor lock-in.

1. OpenCost (Cost Monitoring)

What it does:

  • Provides real-time cost visibility into Kubernetes workloads.
  • Helps teams track and allocate cloud costs per namespace, pod, and container.
  • Identifies overprovisioned resources to reduce waste.

Installation guide

Step 1: Install OpenCost in Your Cluster

You can deploy OpenCost using Helm:

helm repo add opencost https://opencost.github.io/opencost-helm-chart/
helm repo update
helm install opencost opencost/opencost --namespace opencost --create-namespace

Step 2: Verify the Installation

Check that the OpenCost pod is running:

kubectl get pods -n opencost

Step 3: Access the OpenCost UI

Expose OpenCost with port forwarding:

kubectl port-forward -n opencost svc/opencost 9090:9090

Now, visit http://localhost:9090 in your browser.

Step 4: View Costs for Workloads

Run this command to see the cost breakdown for all namespaces:

kubectl get cost --namespace=default

Example Use Cases

Find the most expensive namespace

  • Helps you identify cost-heavy workloads so you can optimize them.

Break down costs by CPU, memory, and storage

  • If a pod is overprovisioned, you can adjust resource requests to reduce waste.

Track costs over time

  • Integrate OpenCost with Grafana to visualize trends.

2. Prometheus (Metrics Collection & Monitoring)

What it does:

  • Collects real-time CPU, memory, and network usage.
  • Stores data in a time-series database for historical analysis.
  • Works natively with Kubernetes via the kube-state-metrics exporter.

How to Install and Use Prometheus

Step 1: Install Prometheus with Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

Step 2: Verify the Installation

Check running pods:

kubectl get pods -n monitoring

Step 3: Access the Prometheus UI

Expose Prometheus:

kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090

Now visit http://localhost:9090.

Step 4: Query Metrics in Prometheus

Run this PromQL query to check CPU usage:

rate(container_cpu_usage_seconds_total[5m])

Example Use Cases

Monitor pod resource usage

  • Prevent pod crashes by setting resource limits based on actual usage.

Alert on high CPU/memory consumption

  • Example: Get an alert when a pod uses more than 80% of allocated CPU:
groups:
  - name: high-cpu
    rules:
      - alert: HighCPUUsage
        expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High CPU Usage Detected!"

Track performance trends over time

  • Helps with capacity planning by analyzing past performance.

3. Grafana (Kubernetes Dashboard & Visualization)

What it does:

  • Provides beautiful dashboards for Kubernetes monitoring.
  • Connects with Prometheus, OpenCost, and Loki.
  • Supports alerting and notifications.

How to Install and Use Grafana

Step 1: Install Grafana with Helm

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana grafana/grafana --namespace monitoring

Step 2: Get the Admin Password

kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Step 3: Access the Grafana Dashboard

Expose Grafana:

kubectl port-forward svc/grafana 3000:3000 -n monitoring

Now visit http://localhost:3000, log in with admin and the retrieved password.

Step 4: Add Prometheus as a Data Source

  1. Go to Configuration > Data Sources.
  2. Click Add data source.
  3. Select Prometheus and enter:
    • URL: http://prometheus-kube-prometheus-prometheus.monitoring:9090
  4. Click Save & Test.

Step 5: Import a Kubernetes Monitoring Dashboard

  1. Go to Dashboards > Import.
  2. Use the dashboard ID 3119 (Kubernetes Cluster Monitoring).
  3. Click Import and enjoy your real-time cluster dashboard!

Use Cases

Monitor CPU and memory usage visually

  • Track trends over time to prevent bottlenecks.

Create alerts based on real-time data

  • Example: Send Slack alerts if a pod restarts more than 5 times in 10 minutes.

Combine logs, metrics, and costs in one view

  • Helps with troubleshooting and optimization.

4. Loki (Log Aggregation & Analysis)

What it does:

  • Collects container logs without massive storage overhead.
  • Integrates with Grafana for centralized log visualization.
  • Supports log-based alerts.

How to Install and Use Loki

Step 1: Install Loki with Helm

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki-stack --namespace logging --create-namespace

Step 2: Forward Logs to Loki

Edit the Fluent Bit ConfigMap to send logs to Loki:

[OUTPUT]
    Name        loki
    Match       *
    Url         http://loki.logging.svc:3100/loki/api/v1/push

Step 3: Query Logs in Grafana

  1. Go to Explore > Loki.
  2. Run a log query:
{namespace="default"} |= "error"

Example Use Cases

Find errors in pod logs

  • Search for all logs containing “Out of Memory”.

Correlate logs with metrics

  • See when an error occurred and how it affected performance.

Set up log-based alerts

Example: Notify teams if “connection refused” appears more than 5 times in 5 minutes.

Best Practices for Kubernetes Monitoring

1. Use OpenCost to Track Cloud Costs

  • Identify overprovisioned pods and cut unnecessary spending.
  • Allocate costs by namespace, team, or project.

2. Set Up Alerts for Resource Overuse

  • Use Prometheus Alertmanager to notify teams of spikes.
  • Example alert for high CPU usage:
groups: - name: high-cpu rules: - alert: HighCPUUsage expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8 for: 2m labels: severity: critical annotations: summary: "High CPU Usage Detected!"

3. Correlate Logs and Metrics with Loki & Prometheus

  • If a pod crashes, check logs and metrics together for root cause analysis.

4. Use Grafana for Easy Visualization

  • Set up dashboards for CPU, memory, cost, and pod health.

Final Thoughts: Do You Need Kubernetes Monitoring?

If you’re running Kubernetes, monitoring is a must. Whether you’re a small team or a large enterprise, using open-source tools like OpenCost, Prometheus, and Grafana will help you prevent downtime, optimize costs, and troubleshoot faster.