Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for ClickHouse®
  • Getting started
    • All guides
      • Viewing cluster logs
      • Monitoring the state of clusters and hosts
      • Connecting from DataLens
  • Access management
  • Pricing policy
  • Terraform reference
  • Yandex Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • Monitoring the cluster state
  • Available charts
  • Monitoring the state of hosts
  • Alert settings in Yandex Monitoring
  • Cluster state and status
  • Cluster states
  • Cluster statuses
  1. Step-by-step guides
  2. Logs and monitoring
  3. Monitoring the state of clusters and hosts

Monitoring the state of a ClickHouse® cluster and its hosts

Written by
Yandex Cloud
Updated at October 30, 2025
  • Monitoring the cluster state
    • Available charts
  • Monitoring the state of hosts
  • Alert settings in Yandex Monitoring
  • Cluster state and status
    • Cluster states
    • Cluster statuses

Data on the cluster and host state is available in the management console. You can view them on the Monitoring tab of the cluster management page or in Yandex Monitoring.

Diagnostic information about cluster states is presented as graphs.

Chart update rate:

  • Standard hosts and hosts with an increased RAM to vCPU ratio (memory-optimized): 15 seconds.
  • Hosts with a guaranteed vCPU share under 100% (burstable): 150 seconds.

Note

The most appropriate multiple units (MB, GB, and more) are automatically used in charts.

You can configure alerts in Yandex Monitoring to receive notifications about cluster failures. In Yandex Monitoring, there are two alert thresholds: Warning and Alarm. If the specified threshold is exceeded, you will receive alerts via the configured notification channels.

Monitoring the cluster stateMonitoring the cluster state

To view detailed information on the state of a Managed Service for ClickHouse® cluster:

Management console
  1. In the management console, navigate to the folder page and select Managed Service for ClickHouse.

  2. Click the cluster name and select the Monitoring tab.

    The page that opens will display the performance charts for the cluster and ClickHouse® hosts.

  3. To get started with Yandex Monitoring metrics, dashboards, or alerts, click Open in Monitoring in the top panel.

Available chartsAvailable charts

If the cluster consists of ClickHouse® and ZooKeeper hosts, the Monitoring page will have the following tabs:

  • ClickHouse: State information for whole cluster and ClickHouse® hosts.
  • ZooKeeper: State information for ZooKeeper hosts.

If the cluster consists of only ClickHouse® hosts, the Monitoring page will have the Clusters tab with information aligned with the ClickHouse tab.

Regardless of the cluster configuration, the Monitoring page also has the Hosts tab for detailed host status information.

ClickHouse
ZooKeeper

The tab displays the following charts:

  • Under Summary:

    • Select queries: Number of select queries per second for a cluster.
    • Insert queries: Number of insert queries per second for a cluster.
    • Total queries: Total number of queries per second for a cluster.
    • Inserted data: Data insertion rate for a cluster.
    • Read data: Data read rate for a cluster.
    • Merged data: Data merge rate for a cluster.
    • CPU usage: Number of CPU cores used in a ClickHouse® subcluster.
    • Memory usage: Use of RAM in a ClickHouse® subcluster.
    • Disk space usage: Disk space used in a ClickHouse® subcluster.
  • Under Queries:

    • Select queries per host: Number of select queries per second per cluster host.
    • Insert queries per host: Number of insert queries per second per cluster host.
    • Total queries per host: Total number of queries per second per cluster host.
    • Failed select queries per host: Percentage of failed select queries per ClickHouse® subcluster host.
    • Failed insert queries per host: Percentage of failed insert queries per ClickHouse® subcluster host.
    • Failed queries per host: Percentage of failed queries per ClickHouse® subcluster host.
    • Average select query time per host: Average select query time per ClickHouse® subcluster host.
    • Average insert query time per host: Average insert query time per ClickHouse® subcluster host.
    • Average query time per host: Average query time per host per ClickHouse® subcluster host.
  • Under Connections and locks:

    • Connections per host: Number of connections per cluster host.
    • Active locks per host: Number of active locks per cluster host.
    • Waiting locks per host: Number of waiting locks per cluster host.
  • Under Data Traffic:

    • Read data per host: Data read rate per cluster host.
    • Inserted data per host: Data insertion rate per cluster host.
    • Merged data per host: Data merge rate per cluster host.
    • Read rows per host: Data read rate per second per cluster host.
    • Inserted rows per host: Row insertion rate per second per cluster host.
    • Merged rows per host: Row merge rate per second per cluster host.
  • Under Storage:

    • Disk space usage per host, bytes: The disk space used per ClickHouse® subcluster host.
    • Disk space usage per host, %: Percentage of the disk space used per ClickHouse® subcluster host.
    • Inode usage, %: Number of inodes used per ClickHouse® subcluster host.
    • Databases: Number of databases per cluster host.
    • Tables: Number of tables per cluster host.
    • Rows of MergeTree tables: Number of rows in MergeTree tables per cluster host.
    • Data parts: Number of data parts per cluster host.
    • Detached data parts: Number of separate data parts per cluster host.
  • Under Replication and Background Data Processing:

    • Max replication delay across tables: Maximum table replication delay per cluster host. Values greater than a few seconds may indicate excessive load or replication issues.
    • Replication queue: Replication queue size per cluster host.
    • Max data parts per partition: Maximum number of data parts per partition per cluster host. This value is limited by the DBMS settings. Approaching the limit indicates excessive load or low efficiency of data insertion.
    • Merges and mutations pool tasks: Number of active merge and mutation tasks in the background pool per ClickHouse® subcluster host.
    • Fetches pool tasks: Number of active fetch tasks in the background pool per ClickHouse® subcluster host.
    • Move pool tasks: Number of active move tasks in the background pool per ClickHouse® subcluster host.
  • Under System Resources:

    • CPU usage per host, cores: Number of CPU cores used per ClickHouse® subcluster host.
    • Memory usage per host, bytes: RAM used per ClickHouse® subcluster host.
    • CPU usage per host, %: CPU core usage percentage per ClickHouse® subcluster host.
    • Memory usage per host, %: Percentage of RAM used per ClickHouse® subcluster host.
    • Disk read per host: Disk read rate per ClickHouse® subcluster host.
    • Disk write per host: Disk write rate per ClickHouse® subcluster host.
    • Disk usage per host: Speed of disk operations per ClickHouse® subcluster host.
    • Network data received per host: Network data receive rate per ClickHouse® subcluster host.
    • Network data sent per host: Network data send rate per ClickHouse® subcluster host.
    • Network usage per host: Network data exchange rate per ClickHouse® subcluster host.

The tab displays the following charts:

  • Transactions: Number of transactions per second.
  • Outstanding requests per ZooKeeper host: Number of requests being processed per ZooKeeper host.
  • Connections per ZooKeeper host: Number of connections per ZooKeeper host.
  • Transactions per ClickHouse® host: Number of transactions per second per ClickHouse® host.
  • Average transaction time per ClickHouse® host: Average transaction time per ClickHouse® host. Shows the time ClickHouse® spends to access ZooKeeper.
  • Average latency per ZooKeeper host: Average latency per ZooKeeper.
  • Znodes: Number of znodes.
  • Ephemeral nodes: Number of ephemeral nodes.
  • Watches: Number of watches.

Note

For more information about znodes, ephemeral nodes, and watches, see this ZooKeeper guide.

  • CPU cores usage: Number of CPU cores used in a ZooKeeper subcluster.
  • You can use the following charts to monitor RAM usage:
    • Memory usage for a ZooKeeper subcluster.
    • Memory usage per ZooKeeper subcluster host.
  • You can use the following charts to monitor disk space usage:
    • Disk space usage for a ZooKeeper subcluster.
    • Disk space usage per ZooKeeper subcluster host.
  • CPU cores usage per host: Number of CPU cores used per host.
  • CPU usage per host: CPU core workload per host.
  • Memory usage per host: RAM usage percentage per host.
  • Disk space usage per host: Disk space usage percentage per host.
  • Disk read per host: Disk read rate per host.
  • Disk write per host: Disk write rate per host.
  • Disk usage per host: Speed of disk operations per host.
  • Network data received per host: Network data receive rate per host.
  • Network data sent per host: Network data send rate per host.
  • Network usage per host: Network data exchange rate per host.

Monitoring the state of hostsMonitoring the state of hosts

To view detailed information on the state of individual Managed Service for ClickHouse® hosts:

Management console
  1. In the management console, navigate to the folder page and select Managed Service for ClickHouse.

  2. Click the cluster name and select the Monitoring tab.

  3. Navigate to the Hosts tab and select the host.

    Host type, CLICKHOUSE or ZOOKEEPER, is specified for each host.

    To get started with Yandex Monitoring metrics, dashboards, or alerts, click Open in Monitoring in the top panel.

ClickHouse®
ZooKeeper

The following charts are displayed for ClickHouse® hosts:

  • Availability: Host availability.
  • Quearies: Number of queries per second for each type.
  • Connections: Number of HTTP and TCP connections.
  • Failed queries: Percentage of failed queries for each type.
  • Average query time: Average query time for each type.
  • Locks: Number of active and waiting read and write locks.
  • Processed data: Speed of reading, inserting, and merging data.
  • Processed rows: Speed of reading, inserting, and merging rows per second.
  • Background tasks: Number of merge and mutation, extract and move tasks in the background pool.
  • Max replication delay across tables: Maximum replication delay across tables. Values greater than a few seconds may indicate excessive load or replication issues.
  • Replication queue: Replication queue size.
  • Max data parts per partition: Maximum number of data parts per partition. This value is limited by the DBMS settings. Approaching the limit indicates excessive load or low efficiency of data insertion.
  • CPU usage, %: CPU core usage percentage.
  • Memory usage, %: RAM usage percentage.
  • Disk space usage, %: Disk space usage percentage.
  • CPU usage, cores: Number of CPU cores used.
  • Memory usage, bytes: RAM usage.
  • Disk space usage, bytes: Disk space usage.
  • Disk throughput: Disk throughput.
  • Disk IOPS: Number of disk read and write operations.
  • Network throughput: Network throughput.

The following charts are displayed for ZooKeeper hosts:

  • Availability: Host availability.
  • Role: Host role, Leader or Follower, in a ZooKeeper subcluster.
  • Objects: Number of Znode, Ephemeral node, and Watch objects.
  • Connections: Number of active DB connections to the host.
  • Outstanding requests: Number of outstanding requests to ZooKeeper.
  • Request time: Read and write operation processing time.
  • CPU usage, %: CPU core usage percentage.
  • Memory usage, %: RAM usage percentage.
  • Disk space usage, %: Disk space usage percentage.
  • CPU usage, cores: Number of CPU cores used.
  • Memory usage, bytes: RAM usage.
  • Disk space usage, bytes: Disk space usage.
  • Disk throughput: Disk throughput.
  • Disk IOPS: Number of disk read and write operations.
  • Network throughput: Network throughput.

Alert settings in Yandex MonitoringAlert settings in Yandex Monitoring

Management console
  1. In the management console, select the folder with the cluster for which you want to configure alerts.
  2. In the list of services, select  Monitoring.
  3. Under Service dashboards, select:
    • Managed Service for ClickHouse® — Cluster Overview to configure cluster alerts.
    • Managed Service for ClickHouse® — ZooKeeper to configure ZooKeeper host alerts.
    • Managed Service for ClickHouse® — Host Overview to configure host alerts.
  4. In the relevant metrics chart, click and select Create alert.
  5. If the chart shows multiple metrics, select a data query to generate a metric and click Continue. For more information about the query language, see this Yandex Monitoring guide.
  6. Set the Alarm and Warning thresholds to trigger the alert.
  7. Click Create alert.

To have other cluster health indicators monitored automatically:

Management console
  1. Create an alert.
  2. Add a status metric.
  3. In the alert parameters, set the alert thresholds.

The recommended thresholds are as follows:

Metric Parameter Alarm Warning
Maximum number of data parts per partition ch_system_async_metrics_MaxPartCountForPartition 250 150
Number of failed queries ch_system_events_FailedQuery_rate 20% of the total number of queries 10% of the total number of queries
Storage space used disk.used_bytes 95% of the storage size 80% of the storage size
Number of healthy hosts is_alive <number_of_hosts> - 2 <number_of_hosts> - 1

To determine the threshold values for the ch_system_events_FailedQuery_rate metric, use Total queries for the cluster.

For the disk.used_bytes metric, the Alarm and Warning thresholds are only set in bytes. For example, the recommended values for a 100 GB disk are as follows:

  • Alarm: 102,005,473,280 bytes (95%)
  • Warning: 85,899,345,920 bytes (80%)

You can view the current storage size in the detailed information about the cluster. For a complete list of supported metrics, see this Monitoring guide.

Cluster state and statusCluster state and status

The State of a cluster shows the health of its hosts, while the Status shows whether the cluster is started, stopped, or is at an intermediate stage.

To view a state and status of a cluster:

Management console
  1. In the management console, navigate to the folder dashboard and select Managed Service for ClickHouse.
  2. Hover over the indicator in the cluster row of the Availability column.

Cluster statesCluster states

State Description Suggested actions
ALIVE Cluster is operating normally. No action is required.
DEGRADED Cluster is not running at its full capacity: the state of at least one of the hosts is other than ALIVE. Run the diagnostics:
  • Go to the Hosts tab and see which hosts are not working.
  • Go to the Operations tab and make sure all operations are completed.
  • Make sure the cluster is not under maintenance.
If you cannot find the cause yourself, contact support.
DEAD The cluster is down: none of its hosts are running. Make a support request stating the following:
  • Cluster ID.
  • IDs of the last operations performed on it.
  • Time the cluster entered the DEAD state according to the availability charts.
UNKNOWN Cluster state is unknown. Make a support request stating the following:
  • Cluster ID.
  • IDs of the last operations performed on it.
  • Time the cluster entered the UNKNOWN state according to the availability charts.

Cluster statusesCluster statuses

Status Description Suggested actions
CREATING Preparing for the first start Wait a while and get started. The time it takes to create a cluster depends on the host class.
RUNNING The cluster is operating normally No action is required.
STOPPING The cluster is stopping After a while, the cluster status will switch to STOPPED and the cluster will be disabled. No action is required.
STOPPED The cluster is stopped Start the cluster to get it running again.
STARTING Starting the cluster that was stopped earlier After a while, the cluster status will switch to RUNNING. Wait a while and get started.
UPDATING Updating the cluster's configuration Once the update is complete, the cluster will get the status it had prior to the update: RUNNING or STOPPED.
ERROR Error when performing an operation with the cluster or during a maintenance window If the cluster remains in this status for a long time, contact support. You can see whether a cluster is available by its status.
STATUS_UNKNOWN The cluster is unable to determine its status If the cluster remains in this status for a long time, contact support.

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Viewing cluster logs
Next
Connecting from DataLens
© 2025 Direct Cursus Technology L.L.C.