High availability of a Yandex Managed Service for ClickHouse® cluster
High availability of a Managed Service for ClickHouse® cluster depends on the number and placement of its hosts, replication and sharding settings, as well as other cluster properties.
Number and placement of cluster hosts
A Managed Service for ClickHouse® cluster consists of one or more shards, where each has one or more hosts.
Single-host cluster
A cluster with a single ClickHouse® host does not provide high availability. If the host VM fails, such a cluster will be unavailable until the VM recovery is completed. Single-host clusters are not covered by the Service level agreement (SLA)
Cluster with two or more hosts
The Service Level Agreement (SLA)
A cluster consisting of two or more hosts supports replication: ClickHouse® hosts can step in for one another as the cluster’s primary replica. Such clusters come with a dedicated coordination service, ClickHouse® Keeper or ZooKeeper, which manages replication and query distribution across hosts. You can select the coordination service when creating a cluster or add it later. According to the SLA, the coordination service of a high-availability cluster must be deployed on separate hosts. A configuration where ClickHouse® and ClickHouse® Keeper share the hosts is not highly available.
A cluster may have three to five coordination service hosts. The optimal number of coordination service hosts for a highly available cluster is three. Increasing the number of ZooKeeper or ClickHouse® Keeper hosts affects cluster availability as follows:
- As opposed to clusters with three coordination service hosts, clusters with four host have lower availability: two out of four hosts are more likely to fail than two out of three.
- Five coordination service hosts ensure the cluster remains highly available: even if two coordination service hosts fail at the same time, this will not lead to cluster failure.
Managed Service for ClickHouse® does not support clusters with more than five coordination service hosts.
Multi-shard cluster
Sharding improves cluster availability, but a cluster with multiple single-host shards is not highly available. According to the SLA, to ensure high availability of a sharded cluster, it should have:
- At least two ClickHouse® hosts in each shard, located in different availability zones.
- At least three ZooKeeper or ClickHouse® Keeper hosts located in different availability zones.
Learn more about the impact of sharding on cluster availability.
Storage settings
If storage runs out of space, INSERT queries, background merges, and mutations are suspended. Set up alerts in Yandex Monitoring to monitor storage usage, or enable automatic storage expansion.
Maintenance settings
Hosts may require a reboot during maintenance. A cluster consisting of a single ClickHouse® host will be unavailable during a reboot.
If a cluster consists of multiple hosts or shards, the hosts become unavailable one by one. To ensure your applications run continuously, to connect to a cluster or shard, use a special FQDN always pointing to the available host.
Other settings
Cluster availability may also be affected by:
- Backup settings.
- Storage disk type you selected.
- Host classes.
- Quotas and limits.
- Security group setup.
ClickHouse® is a registered trademark of ClickHouse, Inc