Cluster monitoring with Prometheus and Grafana

Written by

Yandex Cloud

Improved by

Dmitry A.

Updated at August 11, 2025

Required paid resources
Getting started
Install Prometheus
Install the Trickster caching proxy
Install Grafana
Set up and check Grafana
Delete the resources you created

Managed Service for Kubernetes enables you to upload cluster object metrics to monitoring systems.

In this article, you will learn how to set up the Prometheus metrics collection system and the Grafana visualization system in a Managed Service for Kubernetes cluster. The Trickster caching proxy will be installed to speed up the transfer of metrics.

To set up the Managed Service for Kubernetes cluster monitoring system:

If you no longer need the resources you created, delete them.

Required paid resources

The support cost includes:

Fee for using the master and outgoing traffic in a Managed Service for Kubernetes cluster (see Managed Service for Kubernetes pricing).
Fee for using computing resources, OS, and storage in cluster nodes (VMs) (see Compute Cloud pricing).
Fee for the public IP address assigned to cluster nodes (see Virtual Private Cloud pricing).

Getting started

Create security groups for the Managed Service for Kubernetes cluster and its node groups.

Warning

The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.
Create a Managed Service for Kubernetes cluster and a node group in any suitable configuration with internet access and the security groups prepared earlier.
Install kubect and configure it to work with the new cluster.
Install Helm v3.8.0 or higher.

Install Prometheus

The Prometheus monitoring system scans Managed Service for Kubernetes cluster objects and collects their metrics into its own database. The collected metrics are available within the Managed Service for Kubernetes cluster over HTTP.

Add a repository containing the Prometheus distribution:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
helm repo update

Install Prometheus:

helm install my-prom prometheus-community/prometheus

Make sure that all pods have entered the Running state:

kubectl get pods -l "app.kubernetes.io/instance=my-prom"

Result:

NAME                                              READY  STATUS   RESTARTS  AGE
my-prom-prometheus-alertmanager-7b********-xt6ws  2/2    Running  0         81s
my-prom-prometheus-node-exporter-*****            1/1    Running  0         81s
my-prom-prometheus-pushgateway-69********-swrfb   1/1    Running  0         81s
my-prom-prometheus-server-7b********-m4v78        2/2    Running  0         81s

Install the Trickster caching proxy

The Trickster caching proxy speeds up reading from a Prometheus database, which enables the display of near real-time Grafana metrics and reduces the load on Prometheus.

Create a configuration file named trickster.yaml that contains Trickster settings:

trickster.yaml

 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
   name: trickster-pvc
 spec:
   accessModes:
     - ReadWriteOnce
   storageClassName: yc-network-hdd
   resources:
     requests:
       storage: 15Gi
 
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: trickster-conf
   labels:
     name: trickster-conf
 
 data:
   trickster.conf: |-
     [frontend]
     listen_port = 8480
     tls_listener = false
     connections_limit = 0
     [logging]
     log_level = "info"
 
     [caching]
     cache_type = "filesystem"
     filesystem_path = "/tmp/trickster"
 
     [proxy]
     origin = "default"
 
     [origins.default]
     origin_type = "prometheus"
     origin_url = "http://my-prom-prometheus-server:80"
     is_default = true
 
     [metrics]
     listen_port = 8481
     listen_address = ""
 
     [health]
     listen_port = 8481
     listen_address = ""
 
     [telemetry]
     prometheus_metrics = false
 
     [logging.profiler]
     enabled = false
     port = 6060
 
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: trickster
   labels:
     app: trickster
 spec:
   replicas: 1
   selector:
     matchLabels:
       app: trickster
   template:
     metadata:
       labels:
         app: trickster
     spec:
       containers:
         - name: trickster
           image: tricksterproxy/trickster:1.1
           imagePullPolicy: IfNotPresent
           args:
             - -config
             - /etc/trickster/trickster.conf
           ports:
             - name: http
               containerPort: 8480
               protocol: TCP
             - name: metrics
               containerPort: 8481
               protocol: TCP
           volumeMounts:
             - name: config-volume
               mountPath: /etc/trickster
               readOnly: true
             - name: cache-volume
               mountPath: /tmp/trickster
           env:
             - name: NAMESPACE
               valueFrom:
                 fieldRef:
                   fieldPath: metadata.namespace
       volumes:
         - name: config-volume
           configMap:
             name: trickster-conf
             items:
               - key: trickster.conf
                 path: trickster.conf
         - name: cache-volume
           persistentVolumeClaim:
             claimName: trickster-pvc
 
 ---
 apiVersion: v1
 kind: Service
 metadata:
   annotations:
     prometheus.io/scrape: "true"
     prometheus.io/port: "8481"
     prometheus.io/path: "/metrics"
   name: trickster
 spec:
   ports:
     - name: http
       port: 8480
       targetPort: http
     - name: metrics
       port: 8481
       targetPort: metrics
   selector:
     app: trickster

You can change the size of the storage allocated to the caching proxy. Specify the storage size you need in the PersistentVolumeClaim.spec.resources.requests.storage parameter.

Install Trickster:
```
kubectl apply -f trickster.yaml
```
Make sure the Trickster pod has entered the Running state:
```
kubectl get pods -l "app=trickster"
```

The caching proxy is available in the Managed Service for Kubernetes cluster at http://trickster:8480. Grafana will use this URL to collect metrics.

Install Grafana

When deploying the application, the following will be created:

Deployment of the Grafana application.
PersistentVolumeClaim to reserve internal storage.
Service of the LoadBalancer type to enable network access to the Grafana management console.

To install Grafana:

Create a configuration file named grafana.yaml.

grafana.yaml

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: grafana
  name: grafana
spec:
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      securityContext:
        fsGroup: 472
        supplementalGroups:
          - 0
      containers:
        - name: grafana
          image: grafana/grafana:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 3000
              name: http-grafana
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /robots.txt
              port: 3000
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 2
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            tcpSocket:
              port: 3000
            timeoutSeconds: 1
          resources:
            requests:
              cpu: 250m
              memory: 750Mi
          volumeMounts:
            - mountPath: /var/lib/grafana
              name: grafana-pv
      volumes:
        - name: grafana-pv
          persistentVolumeClaim:
            claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
  ports:
    - port: 3000
      protocol: TCP
      targetPort: http-grafana
  selector:
    app: grafana
  sessionAffinity: None
  type: LoadBalancer

If required, change:

Storage size allocated for Grafana in the spec.resources.requests.storage parameter for kind: PersistentVolumeClaim.
Computing resources allocated to the Grafana pod in the spec.containers.resources parameters for kind: Deployment.

Install Grafana:
```
kubectl apply -f grafana.yaml
```
Make sure the Grafana pod has entered the Running state:
```
kubectl get pods -l "app=grafana"
```

Set up and check Grafana

Find the address where Grafana is available and go to it:
```
export GRAFANA_IP=$(kubectl get service/grafana -o jsonpath='{.status.loadBalancer.ingress[0].ip}') && \
export GRAFANA_PORT=$(kubectl get service/grafana -o jsonpath='{.spec.ports[0].port}') && \
echo http://$GRAFANA_IP:$GRAFANA_PORT
```
Note

If the resource is unavailable at the specified URL, make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.
In the browser window that opens, enter your admin/admin username and password and then set a new password for the admin user.
Add a data source with the Prometheus type and the following settings:
- Name: Prometheus.
- URL: http://trickster:8480.
Click Save & test and make sure that the data source was successfully connected (Data source is working).
Import the Kubernetes Deployment Statefulset Daemonset metrics dashboard containing the basic Kubernetes metrics. Specify the dashboard ID (8588) when importing.

Tip

To check the scenario, you can use any suitable dashboard from the Grafana catalog.
Open the dashboard and make sure that Grafana receives metrics from the Managed Service for Kubernetes cluster.

Delete the resources you created

Delete the resources you no longer need to avoid paying for them:

Delete the Managed Service for Kubernetes cluster.
Delete the Managed Service for Kubernetes cluster's public static IP address if you had reserved one.
Delete the disk created for the trickster storage. You can find it by the label in the disk description, which you can check using the kubectl describe pvc trickster-pvc command: the label will match the value in the Volume field.

Cluster monitoring with Prometheus and Grafana

Required paid resourcesRequired paid resources

Getting startedGetting started

Install PrometheusInstall Prometheus

Install the Trickster caching proxyInstall the Trickster caching proxy

Install GrafanaInstall Grafana

Set up and check GrafanaSet up and check Grafana

Delete the resources you createdDelete the resources you created

Was the article helpful?