ClickHouse® dashboard metrics

Written by

Updated at April 8, 2026

General info
Dashboard variables
Panels and metrics
Monitoring best practices
Extra resources

Grafana's ClickHouse® dashboard enables comprehensive monitoring of a ClickHouse® DBMS cluster. The dashboard displays performance metrics, replication status, use of resources, and other critical cluster parameters.

To open a cluster dashboard:

If you have not opened a project yet, select one.
In the left-hand menu, select ClickHouse® Clusters.
Select a cluster.
Click Cluster monitoring.

This will open the cluster dashboard.

General info

Dashboard title: ClickHouse®.
UID: clickhouse-operator.
Refresh interval: 10 seconds.
Data source: Prometheus.

Dashboard variables

The dashboard uses the following variables for data filtering:

Cluster (chi): Selecting a ClickHouse® cluster.
Server (hostname): Selecting a specific server.
Namespace (namespace): Stackland project where ClickHouse® Operator is deployed.

Panels and metrics

1. Uptime (logarithmic)

Description: ClickHouse® server uptime since last restart. The chart uses a logarithmic scale to make large values easier to read.

Metric: chi_clickhouse_metric_Uptime.

Unit of measurement: Seconds.

2. Failed Pods

Description: Number of pods where metrics-exporter fails to retrieve metrics from clickhouse-server. Any non-zero value indicates issues with server availability.

Metric: chi_clickhouse_metric_fetch_errors.

Unit of measurement: Count.

Recommendations: If errors occur, check the pod status by running kubectl get pods --all-namespaces | grep clickhouse.

Links:

metric_fetch_errors on GitHub

3. Version

Description: ClickHouse® version deployed on the servers. The system shows the version in numeric format; e.g., 11.22.33 appears as 11022033.

Metric: chi_clickhouse_metric_VersionInteger.

Unit of measurement: Numeric version format.

4. Tables / Databases

Description: Total number of tables and databases in the cluster.

Metrics:

chi_clickhouse_metric_NumberOfTables: Number of tables.
chi_clickhouse_metric_NumberOfDatabases: Number of databases.

Unit of measurement: Count

5. ReadOnly replicas

Description: Number of replicas in read-only mode. Any non-zero value indicates replication issues.

Metric: chi_clickhouse_metric_ReadonlyReplica.

Unit of measurement: Count.

Recommendations: Check the ZooKeeper connection, available disk space, and network connectivity between replicas.

Links:

6. DNS and Distributed Connection Errors

Description: DNS errors and connectivity failures between servers in distributed tables.

Metrics:

chi_clickhouse_event_NetworkErrors: Network errors.
chi_clickhouse_event_DistributedConnectionFailAtAll: Complete failures of distributed connections.
chi_clickhouse_event_DistributedConnectionFailTry: Failed connection attempts.
chi_clickhouse_event_DNSError: DNS errors.

Unit of measurement: Events per minute

Links:

7. Replication and ZooKeeper Exceptions

Description: Replication metrics and exceptions when working with ZooKeeper.

Metrics:

chi_clickhouse_metric_ReadonlyReplica: Read-only replicas.
chi_clickhouse_event_ReplicaPartialShutdown: Partial replica shutdown.
chi_clickhouse_event_ZooKeeperUserExceptions: custom ZooKeeper exceptions.
chi_clickhouse_event_ZooKeeperInit: ZooKeeper initialization.
chi_clickhouse_metric_ZooKeeperSession: ZooKeeper sessions.
chi_clickhouse_event_ZooKeeperHardwareExceptions: hardware ZooKeeper exceptions.

Unit of measurement: Events per minute.

Links:

8. Delayed/Rejected/Pending Inserts

Description: Metrics of delayed, rejected, and pending data inserts.

Metrics:

chi_clickhouse_metric_DelayedInserts: Current number of delayed INSERT queries.
chi_clickhouse_event_DelayedInserts: Total counter of delayed blocks.
chi_clickhouse_event_RejectedInserts: Number of rejected blocks.
chi_clickhouse_metric_DistributedFilesToInsert: Files pending insertion into distributed tables.
chi_clickhouse_metric_BrokenDistributedFilesToInsert: Corrupted files in distributed tables.

Unit of measurement: Count.

Metric description:

delayed query: Number of INSERT queries delayed due to a large number of active data parts.
delayed blocks: Number of blocks with delayed insertion.
rejected blocks: Number of blocks whose insertion was rejected with a Too many parts error.

Recommendations: Check the parts_to_delay_insert and parts_to_throw_insert settings in the system.merge_tree_settings table.

Links:

9. Queries (running)

Description: Number of running queries per server and cluster-wide.

Metric: chi_clickhouse_metric_Query.

Unit of measurement: Count.

Links:

10. Select Queries (started per sec)

Description: Number of SELECT queries per second.

Metric: chi_clickhouse_event_SelectQuery.

Unit of measurement: Queries per second.

11. Memory for Queries

Description: Total memory allocated for running queries. Certain memory allocations may not be considered.

Metric: chi_clickhouse_metric_MemoryTracking.

Unit of measurement: Bytes.

Links:

max_memory_usage

12. Insert Queries (running)

Description: Number of running INSERT queries. It does not include queries that failed parsing or were rejected due to limits, but does include internal ClickHouse®-initiated queries.

Metric: chi_clickhouse_event_InsertQuery.

Unit of measurement: Queries per minute.

13. Insert Queries (started per sec)

Description: Number of INSERT queries per second.

Metric: chi_clickhouse_event_InsertQuery.

Unit of measurement: Queries per second.

14. Rows Inserted

Description: Number of rows inserted into tables.

Metric: chi_clickhouse_event_InsertedRows.

Unit of measurement: Rows per minute.

15. Replication Queue Jobs

Description: Rate of data part exchange between replicas.

Metrics:

chi_clickhouse_event_ReplicatedDataLoss: Data loss during replication.
chi_clickhouse_event_ReplicatedPartChecks: Counter of data part checks.
chi_clickhouse_event_ReplicatedPartChecksFailed: Counter of failed data part checks.
chi_clickhouse_event_ReplicatedPartFetches: Network replication activity.
chi_clickhouse_event_ReplicatedPartFailedFetches: Counter of failed attempts to fetch data parts.
chi_clickhouse_event_ReplicatedPartFetchesOfMerged: Fetching merged data parts.
chi_clickhouse_event_ReplicatedPartMerges: Merging replicated data parts.
chi_clickhouse_metric_ReplicasSumInsertsInQueue: Replication lag. It shows the number of pending queries in the queue.
chi_clickhouse_metric_ReplicasSumMergesInQueue: Data merge lag. It shows the number of merges not yet completed by replicas.

Unit of measurement: Events per minute.

Links:

How replication works

16. Max Replica Delay

Description: Replica lag relative to the current time for direct inserts into *ReplicatedMergeTree tables.

Metrics:

chi_clickhouse_metric_ReplicasMaxAbsoluteDelay: Absolute lag, in seconds.
chi_clickhouse_metric_ReplicasMaxRelativeDelay: Relative lag, in seconds.

Unit of measurement: Seconds.

Links:

17. Zookeeper Transactions

Description: Number of ZooKeeper transactions per second.

Metric: chi_clickhouse_event_ZooKeeperTransactions.

Unit of measurement: Transactions per second.

Links:

Replication architecture

18. Merges

Description: Rate of background merges for data parts.

Metric: chi_clickhouse_event_Merge.

Unit of measurement: Merges per minute.

Links:

19. Merged Rows

Description: Number of rows processed in merging.

Metric: chi_clickhouse_event_MergedRows.

Unit of measurement: Rows per minute.

20. Merged Uncompressed Bytes

Description: Size of uncompressed data processed in merging.

Metric: chi_clickhouse_event_MergedUncompressedBytes.

Unit of measurement: Bytes per minute.

21. Active Parts

Description: Number of active data parts in tables.

Metric: chi_clickhouse_table_parts (filtered by active="1").

Unit of measurement: Count.

Links:

22. Detached parts

Description: Number of detached data parts, along with the reason for detachment.

Metrics:

chi_clickhouse_metric_DetachedParts: Number of detached data parts.
chi_clickhouse_table_parts (filtered by active="0"): Inactive parts.

Unit of measurement: Count.

Reasons for detachment:

detached_by_user: Detached by the user.
broken: Corrupted parts.
clone: Cloned parts.
ignored: Ignored parts.

Links:

system.detached_parts

23. Max Part count for Partition

Description: Maximum number of physical data parts per logical partition.

Metric: chi_clickhouse_metric_MaxPartCountForPartition.

Unit of measurement: Count.

Links:

24. clickhouse-server Process Memory

Description: Memory usage by clickhouse-server (available since ClickHouse® 20.4+).

Metrics:

chi_clickhouse_metric_MemoryCode: Executable code (CODE).
chi_clickhouse_metric_MemoryResident: Resident set size (RSS).
chi_clickhouse_metric_MemoryShared: Shared memory (SHR).
chi_clickhouse_metric_MemoryDataAndStack: Data and stack (DATA).
chi_clickhouse_metric_MemoryVirtual: Virtual memory (VIRT).

Unit of measurement: Bytes.

Memory type description:

VIRT: Total virtual memory (VIRT = SWAP + RSS).
SWAP: Amount of memory swapped out.
RSS: Physical memory not swapped out (RSS = CODE + DATA).
CODE: Memory for executable code (text resident set).
DATA: Memory for non-executable data (data resident set).
SHR: Shared memory available to other processes.

Links:

Description of Linux memory types

25. Primary Keys Memory

Description: Memory allocated for primary key storage.

Metric: chi_clickhouse_metric_MemoryPrimaryKeyBytesAllocated.

Unit of measurement: Bytes.

Links:

Selecting a primary key

26. Dictionary Memory

Description: Memory allocated for dictionaries.

Metric: chi_clickhouse_metric_MemoryDictionaryBytesAllocated.

Unit of measurement: Bytes.

Links:

27. Disk Space Free

Description: Free disk space ratio. Make sure to consider configurations with multiple volumes, Kubernetes volume claims, and Object Storage as the storage backend.

Metric: chi_clickhouse_metric_DiskFreeBytes / chi_clickhouse_metric_DiskTotalBytes.

Unit of measurement: Fraction (0–1).

Links:

28. Table Stats

Description: Table statistics, such as data size, row count, number of parts, and average row size.

Metrics:

chi_clickhouse_table_parts_bytes: Data size, in bytes.
chi_clickhouse_table_parts_rows: Number of rows.
chi_clickhouse_table_parts: Number of parts.

Unit of measurement:

Bytes
Rows
Parts
BytePerRow (calculated field)

29. Clickhouse Data size on Disk

Description: Total disk space used by *MergeTree tables.

Metric: chi_clickhouse_metric_DiskDataBytes.

Unit of measurement: Bytes.

Links:

system.parts

30. Background Tasks

Description: Number of active background tasks.

Metrics:

chi_clickhouse_metric_BackgroundPoolTask: Merge, mutation, data fetch, and replication queue management tasks.
chi_clickhouse_metric_BackgroundSchedulePoolTask: Periodic ReplicatedMergeTree tasks, such as cleanup of old parts, part mutations, and replica reinitialization.
chi_clickhouse_metric_BackgroundMovePoolTask: Data movement tasks.

Unit of measurement: Count.

Links:

31. Mutations

Description: Number of active mutations (ALTER DELETE/ALTER UPDATE) and data parts pending mutation.

Metrics:

chi_clickhouse_table_mutations: Number of mutations.
chi_clickhouse_table_mutations_parts_to_do: Number of parts pending mutation.

Unit of measurement: Count.

Links:

32. Marks Cache Hit Rate

Description: Cache hit rate for mark files (.mrk) read from memory rather than disk.

Metric: chi_clickhouse_event_MarkCacheHits / (chi_clickhouse_event_MarkCacheHits + chi_clickhouse_event_MarkCacheMisses).

Unit of measurement: Fraction (0–1).

Links:

33. CPU Time per second

Description: CPU time spent on different types of activity.

Metrics:

chi_clickhouse_event_RealTimeMicroseconds: Real execution time.
chi_clickhouse_event_UserTimeMicroseconds: User CPU time.
chi_clickhouse_event_SystemTimeMicroseconds: System CPU time.
chi_clickhouse_event_OSIOWaitMicroseconds: I/O wait time.
chi_clickhouse_event_OSCPUWaitMicroseconds: CPU wait time.
chi_clickhouse_event_OSCPUVirtualTimeMicroseconds: Virtual CPU time.

Unit of measurement: Microseconds per second

34. Network / Disk CPU Time per second

Description: CPU time spent on network and disk operations.

Metrics:

chi_clickhouse_event_DiskReadElapsedMicroseconds: Disk read time.
chi_clickhouse_event_DiskWriteElapsedMicroseconds: Disk write time.
chi_clickhouse_event_NetworkReceiveElapsedMicroseconds: Network receive time.
chi_clickhouse_event_NetworkSendElapsedMicroseconds: Network send time.

Unit of measurement: Microseconds per second

35. Load Average 1m

Description: Average system load over one minute (Unix load average). Load is considered high if it approaches the number of available CPUs or the CPU limits allocated to the ClickHouse® pod.

Metric: chi_clickhouse_metric_LoadAverage1.

Unit of measurement: Dimensionless quantity.

36. CPU Time total

Description: Total CPU time spent on various activities over the selected period.

Metrics:

chi_clickhouse_event_DiskReadElapsedMicroseconds: Disk read time.
chi_clickhouse_event_DiskWriteElapsedMicroseconds: Disk write time.
chi_clickhouse_event_NetworkReceiveElapsedMicroseconds: Network receive time.
chi_clickhouse_event_NetworkSendElapsedMicroseconds: Network send time.
chi_clickhouse_event_RealTimeMicroseconds: Real query execution time.
chi_clickhouse_event_UserTimeMicroseconds: User CPU time.
chi_clickhouse_event_SystemTimeMicroseconds: System CPU time.
chi_clickhouse_event_OSIOWaitMicroseconds: I/O wait time.
chi_clickhouse_event_OSCPUWaitMicroseconds: CPU wait time.
chi_clickhouse_event_OSCPUVirtualTimeMicroseconds: CPU time spent on virtual OS processes.
chi_clickhouse_event_ThrottlerSleepMicroseconds: Throttler wait time.
chi_clickhouse_event_DelayedInsertsMilliseconds: Time spent on delayed inserts.
chi_clickhouse_event_ZooKeeperWaitMicroseconds: ZooKeeper wait time.
chi_clickhouse_event_CompileExpressionsMicroseconds: Expression compilation time.
chi_clickhouse_event_MergesTimeMilliseconds: Merge time.
chi_clickhouse_event_RWLockReadersWaitMilliseconds: Read lock wait time.
chi_clickhouse_event_RWLockWritersWaitMilliseconds: Write lock wait time.
chi_clickhouse_event_SelectQueryTimeMicroseconds: Time spent running SELECT queries.
chi_clickhouse_event_InsertQueryTimeMicroseconds: Time spent running INSERT queries.
chi_clickhouse_event_Object StorageReadMicroseconds: Object Storage read time.
chi_clickhouse_event_Object StorageWriteMicroseconds: Object Storage write time.

Unit of measurement: Microseconds.

Interval: 1 minute.

37. Connections

Description: Different connection types per server.

Metrics:

chi_clickhouse_metric_TCPConnection: TCP connections (native protocol).
chi_clickhouse_metric_HTTPConnection: HTTP connections.
chi_clickhouse_metric_InterserverConnection: Inter-server connections.
chi_clickhouse_metric_MySQLConnection: MySQL connections.

Unit of measurement: Count.

Links:

Monitoring best practices

Critical metrics

The following metrics require immediate attention when they deviate from normal values:

Failed Pods: It must be 0. Any non-zero value indicates server unavailability.
ReadOnly replicas: It must be 0. Any non-zero value indicates replication issues.
DNS and Distributed Connection Errors: It must be as low as possible. High values indicate network issues.
Delayed/Rejected Inserts: High values of this metric indicate write performance issues.
Disk Space Free: Monitor free space; critical threshold is below 10%.

Performance metrics

To evaluate your cluster performance, pay attention to:

Queries (running): Number of concurrent queries.
Memory for Queries: Query memory usage.
CPU Time per second: CPU load.
Marks Cache Hit Rate: Cache efficiency, which must be more than 90%.

Replication metrics

For replication status monitoring:

Replication Queue Jobs: Replication queue size.
Max Replica Delay: Replica lag.
Zookeeper Transactions: ZooKeeper transaction rate.

Storage metrics

For disk space usage monitoring:

Active Parts: Number of active data parts.
Detached parts : Detached parts, which must be minimal.
Max Part count for Partition: Number of parts per partition.
Clickhouse Data size on Disk: Total data size.

ClickHouse® dashboard metrics

General infoGeneral info

Dashboard variablesDashboard variables

Panels and metricsPanels and metrics

1. Uptime (logarithmic)1. Uptime (logarithmic)

2. Failed Pods2. Failed Pods

3. Version3. Version

4. Tables / Databases4. Tables / Databases

5. ReadOnly replicas5. ReadOnly replicas

6. DNS and Distributed Connection Errors6. DNS and Distributed Connection Errors

7. Replication and ZooKeeper Exceptions7. Replication and ZooKeeper Exceptions

8. Delayed/Rejected/Pending Inserts8. Delayed/Rejected/Pending Inserts

9. Queries (running)9. Queries (running)

10. Select Queries (started per sec)10. Select Queries (started per sec)

11. Memory for Queries11. Memory for Queries

12. Insert Queries (running)12. Insert Queries (running)

13. Insert Queries (started per sec)13. Insert Queries (started per sec)

14. Rows Inserted14. Rows Inserted

15. Replication Queue Jobs15. Replication Queue Jobs

16. Max Replica Delay16. Max Replica Delay

17. Zookeeper Transactions17. Zookeeper Transactions

18. Merges18. Merges

19. Merged Rows19. Merged Rows

20. Merged Uncompressed Bytes20. Merged Uncompressed Bytes

21. Active Parts21. Active Parts

22. Detached parts22. Detached parts

23. Max Part count for Partition23. Max Part count for Partition

24. clickhouse-server Process Memory24. clickhouse-server Process Memory

25. Primary Keys Memory25. Primary Keys Memory

26. Dictionary Memory26. Dictionary Memory

27. Disk Space Free27. Disk Space Free

28. Table Stats28. Table Stats

29. Clickhouse Data size on Disk29. Clickhouse Data size on Disk

30. Background Tasks30. Background Tasks

31. Mutations31. Mutations

32. Marks Cache Hit Rate32. Marks Cache Hit Rate

33. CPU Time per second33. CPU Time per second

34. Network / Disk CPU Time per second34. Network / Disk CPU Time per second

35. Load Average 1m35. Load Average 1m

36. CPU Time total36. CPU Time total

37. Connections37. Connections

Monitoring best practicesMonitoring best practices

Critical metricsCritical metrics

Performance metricsPerformance metrics

Replication metricsReplication metrics

Storage metricsStorage metrics

Extra resourcesExtra resources

Was the article helpful?

General info

Dashboard variables

Panels and metrics

1. Uptime (logarithmic)

2. Failed Pods

3. Version

4. Tables / Databases

5. ReadOnly replicas

6. DNS and Distributed Connection Errors

7. Replication and ZooKeeper Exceptions

8. Delayed/Rejected/Pending Inserts

9. Queries (running)

10. Select Queries (started per sec)

11. Memory for Queries

12. Insert Queries (running)

13. Insert Queries (started per sec)

14. Rows Inserted

15. Replication Queue Jobs

16. Max Replica Delay

17. Zookeeper Transactions

18. Merges

19. Merged Rows

20. Merged Uncompressed Bytes

21. Active Parts

22. Detached parts

23. Max Part count for Partition

24. clickhouse-server Process Memory

25. Primary Keys Memory

26. Dictionary Memory

27. Disk Space Free

28. Table Stats

29. Clickhouse Data size on Disk

30. Background Tasks

31. Mutations

32. Marks Cache Hit Rate

33. CPU Time per second

34. Network / Disk CPU Time per second

35. Load Average 1m

36. CPU Time total

37. Connections

Monitoring best practices

Critical metrics

Performance metrics

Replication metrics

Storage metrics

Extra resources