Kubernetes Monitoring Explained: Best Tools & Setup Guide

Why Monitoring in Kubernetes is Essential

Unlike traditional applications running on fixed servers, Kubernetes workloads are highly dynamic. Pods are constantly created and destroyed, scaling up and down as needed. This makes monitoring more complex but also more critical.

Without proper monitoring, you risk:

Unexpected outages – Pods crash, nodes fail, and services can break without warning.
Overspending on cloud resources – Without visibility into CPU, memory, and storage usage, costs can spiral out of control.
Difficult troubleshooting – Debugging issues in a distributed system without logs and metrics is like searching for a needle in a haystack.

A strong monitoring setup ensures you catch problems before they escalate while keeping costs optimized.

Key Metrics

Monitoring in Kubernetes is typically divided into three main types of data:

1. Metrics Performance Monitoring

These provide real-time insights into the health of your cluster.

Key Metrics to Track:

CPU and memory usage (per pod, node, and cluster level)
Network traffic (incoming and outgoing data)
Pod restarts and failures
Cluster resource allocation (how efficiently resources are used)

Example:
You can use kubectl to check CPU and memory usage:

kubectl top pod --all-namespaces

2. Logs (Application and System Logs)

Logs help you understand what happened when things go wrong.

Key Logs to Monitor:

Pod logs (container output, errors, stack traces)
Node logs (Kubelet, system messages)
Control plane logs (API server, scheduler, controller manager)

Example:
Check logs for a specific pod:

kubectl logs my-pod -n my-namespace

3. Events (Cluster-Wide Activity)

Events show what Kubernetes is doing behind the scenes.

Key Events to Monitor:

Pod scheduling failures (if a pod can’t find a node to run on)
OOMKills (Out-of-Memory Errors)
Network policies being applied or changed

Example:
See the most recent events in your cluster:

kubectl get events --sort-by=.metadata.creationTimestamp

Open-Source Tools for Monitoring

There are many open-source tools to monitor Kubernetes efficiently. These tools provide visibility into costs, workloads, and performance without vendor lock-in.

1. OpenCost (Cost Monitoring)

What it does:

Provides real-time cost visibility into Kubernetes workloads.
Helps teams track and allocate cloud costs per namespace, pod, and container.
Identifies overprovisioned resources to reduce waste.

Installation guide

Step 1: Install OpenCost in Your Cluster

You can deploy OpenCost using Helm:

helm repo add opencost https://opencost.github.io/opencost-helm-chart/
helm repo update
helm install opencost opencost/opencost --namespace opencost --create-namespace

Step 2: Verify the Installation

Check that the OpenCost pod is running:

kubectl get pods -n opencost

Step 3: Access the OpenCost UI

Expose OpenCost with port forwarding:

kubectl port-forward -n opencost svc/opencost 9090:9090

Now, visit http://localhost:9090 in your browser.

Step 4: View Costs for Workloads

Run this command to see the cost breakdown for all namespaces:

kubectl get cost --namespace=default

Example Use Cases

Find the most expensive namespace

Helps you identify cost-heavy workloads so you can optimize them.

Break down costs by CPU, memory, and storage

If a pod is overprovisioned, you can adjust resource requests to reduce waste.

Track costs over time

Integrate OpenCost with Grafana to visualize trends.

2. Prometheus (Metrics Collection & Monitoring)

What it does:

Collects real-time CPU, memory, and network usage.
Stores data in a time-series database for historical analysis.
Works natively with Kubernetes via the kube-state-metrics exporter.

How to Install and Use Prometheus

Step 1: Install Prometheus with Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

Step 2: Verify the Installation

Check running pods:

kubectl get pods -n monitoring

Step 3: Access the Prometheus UI

Expose Prometheus:

kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090

Now visit http://localhost:9090.

Step 4: Query Metrics in Prometheus

Run this PromQL query to check CPU usage:

rate(container_cpu_usage_seconds_total[5m])

Example Use Cases

Monitor pod resource usage

Prevent pod crashes by setting resource limits based on actual usage.

Alert on high CPU/memory consumption

Example: Get an alert when a pod uses more than 80% of allocated CPU:

groups:
  - name: high-cpu
    rules:
      - alert: HighCPUUsage
        expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High CPU Usage Detected!"

Track performance trends over time

Helps with capacity planning by analyzing past performance.

3. Grafana (Kubernetes Dashboard & Visualization)

What it does:

Provides beautiful dashboards for Kubernetes monitoring.
Connects with Prometheus, OpenCost, and Loki.
Supports alerting and notifications.

How to Install and Use Grafana

Step 1: Install Grafana with Helm

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana grafana/grafana --namespace monitoring

Step 2: Get the Admin Password

kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Step 3: Access the Grafana Dashboard

Expose Grafana:

kubectl port-forward svc/grafana 3000:3000 -n monitoring

Now visit http://localhost:3000, log in with admin and the retrieved password.

Step 4: Add Prometheus as a Data Source

Go to Configuration > Data Sources.
Click Add data source.
Select Prometheus and enter:
- URL: http://prometheus-kube-prometheus-prometheus.monitoring:9090
Click Save & Test.

Step 5: Import a Kubernetes Monitoring Dashboard

Go to Dashboards > Import.
Use the dashboard ID 3119 (Kubernetes Cluster Monitoring).
Click Import and enjoy your real-time cluster dashboard!

Use Cases

Monitor CPU and memory usage visually

Track trends over time to prevent bottlenecks.

Create alerts based on real-time data

Example: Send Slack alerts if a pod restarts more than 5 times in 10 minutes.

Combine logs, metrics, and costs in one view

Helps with troubleshooting and optimization.

4. Loki (Log Aggregation & Analysis)

What it does:

Collects container logs without massive storage overhead.
Integrates with Grafana for centralized log visualization.
Supports log-based alerts.

How to Install and Use Loki

Step 1: Install Loki with Helm

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki-stack --namespace logging --create-namespace

Step 2: Forward Logs to Loki

Edit the Fluent Bit ConfigMap to send logs to Loki:

[OUTPUT]
    Name        loki
    Match       *
    Url         http://loki.logging.svc:3100/loki/api/v1/push

Step 3: Query Logs in Grafana

Go to Explore > Loki.
Run a log query:

{namespace="default"} |= "error"

Example Use Cases

Find errors in pod logs

Search for all logs containing “Out of Memory”.

Correlate logs with metrics

See when an error occurred and how it affected performance.

Set up log-based alerts

Example: Notify teams if “connection refused” appears more than 5 times in 5 minutes.

Best Practices for Kubernetes Monitoring

1. Use OpenCost to Track Cloud Costs

Identify overprovisioned pods and cut unnecessary spending.
Allocate costs by namespace, team, or project.

2. Set Up Alerts for Resource Overuse

Use Prometheus Alertmanager to notify teams of spikes.
Example alert for high CPU usage:

groups: - name: high-cpu rules: - alert: HighCPUUsage expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8 for: 2m labels: severity: critical annotations: summary: "High CPU Usage Detected!"

3. Correlate Logs and Metrics with Loki & Prometheus

If a pod crashes, check logs and metrics together for root cause analysis.

4. Use Grafana for Easy Visualization

Set up dashboards for CPU, memory, cost, and pod health.

Final Thoughts: Do You Need Kubernetes Monitoring?

If you’re running Kubernetes, monitoring is a must. Whether you’re a small team or a large enterprise, using open-source tools like OpenCost, Prometheus, and Grafana will help you prevent downtime, optimize costs, and troubleshoot faster.

Visibility & recommendations

Automation

What's new

Use cases

See how Zesty works

Get to know Zesty

Hear it from out Customers

For developers

Platform learning

Industry learning

Learn Kubernetes

Zesty Blog

Kubernetes Monitoring: A Complete Guide

Why Monitoring in Kubernetes is Essential

Key Metrics

1. Metrics Performance Monitoring

2. Logs (Application and System Logs)

3. Events (Cluster-Wide Activity)

Open-Source Tools for Monitoring

1. OpenCost (Cost Monitoring)

Installation guide

Step 1: Install OpenCost in Your Cluster

Step 2: Verify the Installation

Step 3: Access the OpenCost UI

Step 4: View Costs for Workloads

Example Use Cases

2. Prometheus (Metrics Collection & Monitoring)

How to Install and Use Prometheus

Step 1: Install Prometheus with Helm

Step 2: Verify the Installation

Step 3: Access the Prometheus UI

Step 4: Query Metrics in Prometheus

Example Use Cases

3. Grafana (Kubernetes Dashboard & Visualization)

How to Install and Use Grafana

Step 1: Install Grafana with Helm

Step 2: Get the Admin Password

Step 3: Access the Grafana Dashboard

Step 4: Add Prometheus as a Data Source

Step 5: Import a Kubernetes Monitoring Dashboard

Use Cases

4. Loki (Log Aggregation & Analysis)

How to Install and Use Loki

Step 1: Install Loki with Helm

Step 2: Forward Logs to Loki

Step 3: Query Logs in Grafana

Example Use Cases

Best Practices for Kubernetes Monitoring

Final Thoughts: Do You Need Kubernetes Monitoring?

Check out related topics

runc in Kubernetes

What is a Kubernetes Deployment?

What is Egress in Kubernetes?

What is Helm in Kubernetes?

What is hostPath in Kubernetes?

CRI-O: The Lightweight Container Runtime for Kubernetes

Still scrolling? Looks like only a live demo will scratch that itch.

Platform

Solutions

Company

Resources

Proud to be

Still scrolling?
Looks like only a live demo
will scratch that itch.