Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Cloud Stackland
  • What's new
  • Installation
    • All tutorials
    • Installing Stackland on Yandex BareMetal
    • Setting up external access to a pod in a cluster
    • All guides
    • Projects
    • Resource model
      • ClickHouse monitoring
      • NVIDIA® DCGM dashboard metrics
      • NVIDIA® DCGM dashboard metrics with MIG
      • NVIDIA® DCGM dashboard metrics without MIG
      • Hardware monitoring
  • Access management
  • Pricing policy
  • Diagnostics and troubleshooting

In this article:

  • General info
  • Dashboard variables
  • Panels and metrics
  • 1. Uptime (logarithmic)
  • 2. Failed Pods
  • 3. Version
  • 4. Tables / Databases
  • 5. ReadOnly replicas
  • 6. DNS and Distributed Connection Errors
  • 7. Replication and ZooKeeper Exceptions
  • 8. Delayed/Rejected/Pending Inserts
  • 9. Queries (running)
  • 10. Select Queries (started per sec)
  • 11. Memory for Queries
  • 12. Insert Queries (running)
  • 13. Insert Queries (started per sec)
  • 14. Rows Inserted
  • 15. Replication Queue Jobs
  • 16. Max Replica Delay
  • 17. Zookeeper Transactions
  • 18. Merges
  • 19. Merged Rows
  • 20. Merged Uncompressed Bytes
  • 21. Active Parts
  • 22. Detached parts
  • 23. Max Part count for Partition
  • 24. clickhouse-server Process Memory
  • 25. Primary Keys Memory
  • 26. Dictionary Memory
  • 27. Disk Space Free
  • 28. Table Stats
  • 29. Clickhouse Data size on Disk
  • 30. Background Tasks
  • 31. Mutations
  • 32. Marks Cache Hit Rate
  • 33. CPU Time per second
  • 34. Network / Disk CPU Time per second
  • 35. Load Average 1m
  • 36. CPU Time total
  • 37. Connections
  • Monitoring best practices
  • Critical metrics
  • Performance metrics
  • Replication metrics
  • Storage metrics
  • Extra resources
  1. Concepts
  2. Cluster monitoring
  3. ClickHouse monitoring

ClickHouse® dashboard metrics

Written by
Yandex Cloud
Updated at April 8, 2026
  • General info
  • Dashboard variables
  • Panels and metrics
    • 1. Uptime (logarithmic)
    • 2. Failed Pods
    • 3. Version
    • 4. Tables / Databases
    • 5. ReadOnly replicas
    • 6. DNS and Distributed Connection Errors
    • 7. Replication and ZooKeeper Exceptions
    • 8. Delayed/Rejected/Pending Inserts
    • 9. Queries (running)
    • 10. Select Queries (started per sec)
    • 11. Memory for Queries
    • 12. Insert Queries (running)
    • 13. Insert Queries (started per sec)
    • 14. Rows Inserted
    • 15. Replication Queue Jobs
    • 16. Max Replica Delay
    • 17. Zookeeper Transactions
    • 18. Merges
    • 19. Merged Rows
    • 20. Merged Uncompressed Bytes
    • 21. Active Parts
    • 22. Detached parts
    • 23. Max Part count for Partition
    • 24. clickhouse-server Process Memory
    • 25. Primary Keys Memory
    • 26. Dictionary Memory
    • 27. Disk Space Free
    • 28. Table Stats
    • 29. Clickhouse Data size on Disk
    • 30. Background Tasks
    • 31. Mutations
    • 32. Marks Cache Hit Rate
    • 33. CPU Time per second
    • 34. Network / Disk CPU Time per second
    • 35. Load Average 1m
    • 36. CPU Time total
    • 37. Connections
  • Monitoring best practices
    • Critical metrics
    • Performance metrics
    • Replication metrics
    • Storage metrics
  • Extra resources

Grafana's ClickHouse® dashboard enables comprehensive monitoring of a ClickHouse® DBMS cluster. The dashboard displays performance metrics, replication status, use of resources, and other critical cluster parameters.

To open a cluster dashboard:

  1. If you have not opened a project yet, select one.
  2. In the left-hand menu, select ClickHouse® Clusters.
  3. Select a cluster.
  4. Click Cluster monitoring.

This will open the cluster dashboard.

General infoGeneral info

Dashboard title: ClickHouse®.
UID: clickhouse-operator.
Refresh interval: 10 seconds.
Data source: Prometheus.

Dashboard variablesDashboard variables

The dashboard uses the following variables for data filtering:

  • Cluster (chi): Selecting a ClickHouse® cluster.
  • Server (hostname): Selecting a specific server.
  • Namespace (namespace): Stackland project where ClickHouse® Operator is deployed.

Panels and metricsPanels and metrics

1. Uptime (logarithmic)1. Uptime (logarithmic)

Description: ClickHouse® server uptime since last restart. The chart uses a logarithmic scale to make large values easier to read.

Metric: chi_clickhouse_metric_Uptime.

Unit of measurement: Seconds.

2. Failed Pods2. Failed Pods

Description: Number of pods where metrics-exporter fails to retrieve metrics from clickhouse-server. Any non-zero value indicates issues with server availability.

Metric: chi_clickhouse_metric_fetch_errors.

Unit of measurement: Count.

Recommendations: If errors occur, check the pod status by running kubectl get pods --all-namespaces | grep clickhouse.

Links:

  • metric_fetch_errors on GitHub

3. Version3. Version

Description: ClickHouse® version deployed on the servers. The system shows the version in numeric format; e.g., 11.22.33 appears as 11022033.

Metric: chi_clickhouse_metric_VersionInteger.

Unit of measurement: Numeric version format.

4. Tables / Databases4. Tables / Databases

Description: Total number of tables and databases in the cluster.

Metrics:

  • chi_clickhouse_metric_NumberOfTables: Number of tables.
  • chi_clickhouse_metric_NumberOfDatabases: Number of databases.

Unit of measurement: Count

5. ReadOnly replicas5. ReadOnly replicas

Description: Number of replicas in read-only mode. Any non-zero value indicates replication issues.

Metric: chi_clickhouse_metric_ReadonlyReplica.

Unit of measurement: Count.

Recommendations: Check the ZooKeeper connection, available disk space, and network connectivity between replicas.

Links:

  • Recovery after failures
  • Recovery after complete data loss

6. DNS and Distributed Connection Errors6. DNS and Distributed Connection Errors

Description: DNS errors and connectivity failures between servers in distributed tables.

Metrics:

  • chi_clickhouse_event_NetworkErrors: Network errors.
  • chi_clickhouse_event_DistributedConnectionFailAtAll: Complete failures of distributed connections.
  • chi_clickhouse_event_DistributedConnectionFailTry: Failed connection attempts.
  • chi_clickhouse_event_DNSError: DNS errors.

Unit of measurement: Events per minute

Links:

  • Distributed table management
  • DNSError on GitHub

7. Replication and ZooKeeper Exceptions7. Replication and ZooKeeper Exceptions

Description: Replication metrics and exceptions when working with ZooKeeper.

Metrics:

  • chi_clickhouse_metric_ReadonlyReplica: Read-only replicas.
  • chi_clickhouse_event_ReplicaPartialShutdown: Partial replica shutdown.
  • chi_clickhouse_event_ZooKeeperUserExceptions: custom ZooKeeper exceptions.
  • chi_clickhouse_event_ZooKeeperInit: ZooKeeper initialization.
  • chi_clickhouse_metric_ZooKeeperSession: ZooKeeper sessions.
  • chi_clickhouse_event_ZooKeeperHardwareExceptions: hardware ZooKeeper exceptions.

Unit of measurement: Events per minute.

Links:

  • Recommended ZooKeeper settings
  • system.zookeeper

8. Delayed/Rejected/Pending Inserts8. Delayed/Rejected/Pending Inserts

Description: Metrics of delayed, rejected, and pending data inserts.

Metrics:

  • chi_clickhouse_metric_DelayedInserts: Current number of delayed INSERT queries.
  • chi_clickhouse_event_DelayedInserts: Total counter of delayed blocks.
  • chi_clickhouse_event_RejectedInserts: Number of rejected blocks.
  • chi_clickhouse_metric_DistributedFilesToInsert: Files pending insertion into distributed tables.
  • chi_clickhouse_metric_BrokenDistributedFilesToInsert: Corrupted files in distributed tables.

Unit of measurement: Count.

Metric description:

  • delayed query: Number of INSERT queries delayed due to a large number of active data parts.
  • delayed blocks: Number of blocks with delayed insertion.
  • rejected blocks: Number of blocks whose insertion was rejected with a Too many parts error.

Recommendations: Check the parts_to_delay_insert and parts_to_throw_insert settings in the system.merge_tree_settings table.

Links:

  • system.parts_log
  • system.merge_tree_settings

9. Queries (running)9. Queries (running)

Description: Number of running queries per server and cluster-wide.

Metric: chi_clickhouse_metric_Query.

Unit of measurement: Count.

Links:

  • max_concurrent_queries
  • max_execution_time

10. Select Queries (started per sec)10. Select Queries (started per sec)

Description: Number of SELECT queries per second.

Metric: chi_clickhouse_event_SelectQuery.

Unit of measurement: Queries per second.

11. Memory for Queries11. Memory for Queries

Description: Total memory allocated for running queries. Certain memory allocations may not be considered.

Metric: chi_clickhouse_metric_MemoryTracking.

Unit of measurement: Bytes.

Links:

  • max_memory_usage

12. Insert Queries (running)12. Insert Queries (running)

Description: Number of running INSERT queries. It does not include queries that failed parsing or were rejected due to limits, but does include internal ClickHouse®-initiated queries.

Metric: chi_clickhouse_event_InsertQuery.

Unit of measurement: Queries per minute.

13. Insert Queries (started per sec)13. Insert Queries (started per sec)

Description: Number of INSERT queries per second.

Metric: chi_clickhouse_event_InsertQuery.

Unit of measurement: Queries per second.

14. Rows Inserted14. Rows Inserted

Description: Number of rows inserted into tables.

Metric: chi_clickhouse_event_InsertedRows.

Unit of measurement: Rows per minute.

15. Replication Queue Jobs15. Replication Queue Jobs

Description: Rate of data part exchange between replicas.

Metrics:

  • chi_clickhouse_event_ReplicatedDataLoss: Data loss during replication.
  • chi_clickhouse_event_ReplicatedPartChecks: Counter of data part checks.
  • chi_clickhouse_event_ReplicatedPartChecksFailed: Counter of failed data part checks.
  • chi_clickhouse_event_ReplicatedPartFetches: Network replication activity.
  • chi_clickhouse_event_ReplicatedPartFailedFetches: Counter of failed attempts to fetch data parts.
  • chi_clickhouse_event_ReplicatedPartFetchesOfMerged: Fetching merged data parts.
  • chi_clickhouse_event_ReplicatedPartMerges: Merging replicated data parts.
  • chi_clickhouse_metric_ReplicasSumInsertsInQueue: Replication lag. It shows the number of pending queries in the queue.
  • chi_clickhouse_metric_ReplicasSumMergesInQueue: Data merge lag. It shows the number of merges not yet completed by replicas.

Unit of measurement: Events per minute.

Links:

  • How replication works

16. Max Replica Delay16. Max Replica Delay

Description: Replica lag relative to the current time for direct inserts into *ReplicatedMergeTree tables.

Metrics:

  • chi_clickhouse_metric_ReplicasMaxAbsoluteDelay: Absolute lag, in seconds.
  • chi_clickhouse_metric_ReplicasMaxRelativeDelay: Relative lag, in seconds.

Unit of measurement: Seconds.

Links:

  • Replication architecture
  • ReplicatedMergeTree
  • max_replica_delay_for_distributed_queries

17. Zookeeper Transactions17. Zookeeper Transactions

Description: Number of ZooKeeper transactions per second.

Metric: chi_clickhouse_event_ZooKeeperTransactions.

Unit of measurement: Transactions per second.

Links:

  • Replication architecture

18. Merges18. Merges

Description: Rate of background merges for data parts.

Metric: chi_clickhouse_event_Merge.

Unit of measurement: Merges per minute.

Links:

  • START/STOP Merges
  • MergeTree Engine

19. Merged Rows19. Merged Rows

Description: Number of rows processed in merging.

Metric: chi_clickhouse_event_MergedRows.

Unit of measurement: Rows per minute.

20. Merged Uncompressed Bytes20. Merged Uncompressed Bytes

Description: Size of uncompressed data processed in merging.

Metric: chi_clickhouse_event_MergedUncompressedBytes.

Unit of measurement: Bytes per minute.

21. Active Parts21. Active Parts

Description: Number of active data parts in tables.

Metric: chi_clickhouse_table_parts (filtered by active="1").

Unit of measurement: Count.

Links:

  • system.parts
  • parts_to_delay_insert

22. Detached parts22. Detached parts

Description: Number of detached data parts, along with the reason for detachment.

Metrics:

  • chi_clickhouse_metric_DetachedParts: Number of detached data parts.
  • chi_clickhouse_table_parts (filtered by active="0"): Inactive parts.

Unit of measurement: Count.

Reasons for detachment:

  • detached_by_user: Detached by the user.
  • broken: Corrupted parts.
  • clone: Cloned parts.
  • ignored: Ignored parts.

Links:

  • system.detached_parts

23. Max Part count for Partition23. Max Part count for Partition

Description: Maximum number of physical data parts per logical partition.

Metric: chi_clickhouse_metric_MaxPartCountForPartition.

Unit of measurement: Count.

Links:

  • Custom Partitioning Key
  • system.parts
  • system.part_log

24. clickhouse-server Process Memory24. clickhouse-server Process Memory

Description: Memory usage by clickhouse-server (available since ClickHouse® 20.4+).

Metrics:

  • chi_clickhouse_metric_MemoryCode: Executable code (CODE).
  • chi_clickhouse_metric_MemoryResident: Resident set size (RSS).
  • chi_clickhouse_metric_MemoryShared: Shared memory (SHR).
  • chi_clickhouse_metric_MemoryDataAndStack: Data and stack (DATA).
  • chi_clickhouse_metric_MemoryVirtual: Virtual memory (VIRT).

Unit of measurement: Bytes.

Memory type description:

  • VIRT: Total virtual memory (VIRT = SWAP + RSS).
  • SWAP: Amount of memory swapped out.
  • RSS: Physical memory not swapped out (RSS = CODE + DATA).
  • CODE: Memory for executable code (text resident set).
  • DATA: Memory for non-executable data (data resident set).
  • SHR: Shared memory available to other processes.

Links:

  • Description of Linux memory types

25. Primary Keys Memory25. Primary Keys Memory

Description: Memory allocated for primary key storage.

Metric: chi_clickhouse_metric_MemoryPrimaryKeyBytesAllocated.

Unit of measurement: Bytes.

Links:

  • Selecting a primary key

26. Dictionary Memory26. Dictionary Memory

Description: Memory allocated for dictionaries.

Metric: chi_clickhouse_metric_MemoryDictionaryBytesAllocated.

Unit of measurement: Bytes.

Links:

  • system.dictionaries
  • CREATE DICTIONARY

27. Disk Space Free27. Disk Space Free

Description: Free disk space ratio. Make sure to consider configurations with multiple volumes, Kubernetes volume claims, and Object Storage as the storage backend.

Metric: chi_clickhouse_metric_DiskFreeBytes / chi_clickhouse_metric_DiskTotalBytes.

Unit of measurement: Fraction (0–1).

Links:

  • system.disks
  • Multiple Disk Volumes

28. Table Stats28. Table Stats

Description: Table statistics, such as data size, row count, number of parts, and average row size.

Metrics:

  • chi_clickhouse_table_parts_bytes: Data size, in bytes.
  • chi_clickhouse_table_parts_rows: Number of rows.
  • chi_clickhouse_table_parts: Number of parts.

Unit of measurement:

  • Bytes
  • Rows
  • Parts
  • BytePerRow (calculated field)

29. Clickhouse Data size on Disk29. Clickhouse Data size on Disk

Description: Total disk space used by *MergeTree tables.

Metric: chi_clickhouse_metric_DiskDataBytes.

Unit of measurement: Bytes.

Links:

  • system.parts

30. Background Tasks30. Background Tasks

Description: Number of active background tasks.

Metrics:

  • chi_clickhouse_metric_BackgroundPoolTask: Merge, mutation, data fetch, and replication queue management tasks.
  • chi_clickhouse_metric_BackgroundSchedulePoolTask: Periodic ReplicatedMergeTree tasks, such as cleanup of old parts, part mutations, and replica reinitialization.
  • chi_clickhouse_metric_BackgroundMovePoolTask: Data movement tasks.

Unit of measurement: Count.

Links:

  • FETCH PARTITION
  • Mutations
  • Data TTL
  • MOVE PARTITION

31. Mutations31. Mutations

Description: Number of active mutations (ALTER DELETE/ALTER UPDATE) and data parts pending mutation.

Metrics:

  • chi_clickhouse_table_mutations: Number of mutations.
  • chi_clickhouse_table_mutations_parts_to_do: Number of parts pending mutation.

Unit of measurement: Count.

Links:

  • Mutations
  • system.mutations
  • KILL MUTATION

32. Marks Cache Hit Rate32. Marks Cache Hit Rate

Description: Cache hit rate for mark files (.mrk) read from memory rather than disk.

Metric: chi_clickhouse_event_MarkCacheHits / (chi_clickhouse_event_MarkCacheHits + chi_clickhouse_event_MarkCacheMisses).

Unit of measurement: Fraction (0–1).

Links:

  • mark_cache_size
  • MergeTree architecture

33. CPU Time per second33. CPU Time per second

Description: CPU time spent on different types of activity.

Metrics:

  • chi_clickhouse_event_RealTimeMicroseconds: Real execution time.
  • chi_clickhouse_event_UserTimeMicroseconds: User CPU time.
  • chi_clickhouse_event_SystemTimeMicroseconds: System CPU time.
  • chi_clickhouse_event_OSIOWaitMicroseconds: I/O wait time.
  • chi_clickhouse_event_OSCPUWaitMicroseconds: CPU wait time.
  • chi_clickhouse_event_OSCPUVirtualTimeMicroseconds: Virtual CPU time.

Unit of measurement: Microseconds per second

34. Network / Disk CPU Time per second34. Network / Disk CPU Time per second

Description: CPU time spent on network and disk operations.

Metrics:

  • chi_clickhouse_event_DiskReadElapsedMicroseconds: Disk read time.
  • chi_clickhouse_event_DiskWriteElapsedMicroseconds: Disk write time.
  • chi_clickhouse_event_NetworkReceiveElapsedMicroseconds: Network receive time.
  • chi_clickhouse_event_NetworkSendElapsedMicroseconds: Network send time.

Unit of measurement: Microseconds per second

35. Load Average 1m35. Load Average 1m

Description: Average system load over one minute (Unix load average). Load is considered high if it approaches the number of available CPUs or the CPU limits allocated to the ClickHouse® pod.

Metric: chi_clickhouse_metric_LoadAverage1.

Unit of measurement: Dimensionless quantity.

36. CPU Time total36. CPU Time total

Description: Total CPU time spent on various activities over the selected period.

Metrics:

  • chi_clickhouse_event_DiskReadElapsedMicroseconds: Disk read time.
  • chi_clickhouse_event_DiskWriteElapsedMicroseconds: Disk write time.
  • chi_clickhouse_event_NetworkReceiveElapsedMicroseconds: Network receive time.
  • chi_clickhouse_event_NetworkSendElapsedMicroseconds: Network send time.
  • chi_clickhouse_event_RealTimeMicroseconds: Real query execution time.
  • chi_clickhouse_event_UserTimeMicroseconds: User CPU time.
  • chi_clickhouse_event_SystemTimeMicroseconds: System CPU time.
  • chi_clickhouse_event_OSIOWaitMicroseconds: I/O wait time.
  • chi_clickhouse_event_OSCPUWaitMicroseconds: CPU wait time.
  • chi_clickhouse_event_OSCPUVirtualTimeMicroseconds: CPU time spent on virtual OS processes.
  • chi_clickhouse_event_ThrottlerSleepMicroseconds: Throttler wait time.
  • chi_clickhouse_event_DelayedInsertsMilliseconds: Time spent on delayed inserts.
  • chi_clickhouse_event_ZooKeeperWaitMicroseconds: ZooKeeper wait time.
  • chi_clickhouse_event_CompileExpressionsMicroseconds: Expression compilation time.
  • chi_clickhouse_event_MergesTimeMilliseconds: Merge time.
  • chi_clickhouse_event_RWLockReadersWaitMilliseconds: Read lock wait time.
  • chi_clickhouse_event_RWLockWritersWaitMilliseconds: Write lock wait time.
  • chi_clickhouse_event_SelectQueryTimeMicroseconds: Time spent running SELECT queries.
  • chi_clickhouse_event_InsertQueryTimeMicroseconds: Time spent running INSERT queries.
  • chi_clickhouse_event_Object StorageReadMicroseconds: Object Storage read time.
  • chi_clickhouse_event_Object StorageWriteMicroseconds: Object Storage write time.

Unit of measurement: Microseconds.

Interval: 1 minute.

37. Connections37. Connections

Description: Different connection types per server.

Metrics:

  • chi_clickhouse_metric_TCPConnection: TCP connections (native protocol).
  • chi_clickhouse_metric_HTTPConnection: HTTP connections.
  • chi_clickhouse_metric_InterserverConnection: Inter-server connections.
  • chi_clickhouse_metric_MySQLConnection: MySQL connections.

Unit of measurement: Count.

Links:

  • max_connections
  • max_distributed_connections
  • MySQL Protocol
  • HTTP Protocol
  • Native Protocol

Monitoring best practicesMonitoring best practices

Critical metricsCritical metrics

The following metrics require immediate attention when they deviate from normal values:

  1. Failed Pods: It must be 0. Any non-zero value indicates server unavailability.
  2. ReadOnly replicas: It must be 0. Any non-zero value indicates replication issues.
  3. DNS and Distributed Connection Errors: It must be as low as possible. High values indicate network issues.
  4. Delayed/Rejected Inserts: High values of this metric indicate write performance issues.
  5. Disk Space Free: Monitor free space; critical threshold is below 10%.

Performance metricsPerformance metrics

To evaluate your cluster performance, pay attention to:

  1. Queries (running): Number of concurrent queries.
  2. Memory for Queries: Query memory usage.
  3. CPU Time per second: CPU load.
  4. Marks Cache Hit Rate: Cache efficiency, which must be more than 90%.

Replication metricsReplication metrics

For replication status monitoring:

  1. Replication Queue Jobs: Replication queue size.
  2. Max Replica Delay: Replica lag.
  3. Zookeeper Transactions: ZooKeeper transaction rate.

Storage metricsStorage metrics

For disk space usage monitoring:

  1. Active Parts: Number of active data parts.
  2. Detached parts : Detached parts, which must be minimal.
  3. Max Part count for Partition: Number of parts per partition.
  4. Clickhouse Data size on Disk: Total data size.

Extra resourcesExtra resources

  • Official ClickHouse® documentation
  • ClickHouse® Operator on GitHub
  • System Tables Reference
  • Server Configuration Parameters

Was the article helpful?

Previous
SpeechSense
Next
NVIDIA® DCGM dashboard metrics
© 2026 Direct Cursus Technology L.L.C.