Replication in Managed Service for ClickHouse®
In ClickHouse®, replication is performed if the cluster meets all these conditions:
- There is at least one shard with two or more hosts.
- Host coordination tool is set up.
A Managed Service for ClickHouse® cluster with enabled replication is fault-tolerant. In such a cluster, you can create replicated tables.
With Managed Service for ClickHouse®, you can use one of the following tools to coordinate hosts and distribute queries among them:
- ClickHouse® Keeper
- ZooKeeper (default)
ClickHouse® Keeper
Note
This feature is at the Preview stage. Access to ClickHouse® Keeper is available on request. Contact technical support
ClickHouse® Keeper is a service for data replication and running distributed DDL queries; it implements the ZooKeeper-compatible client-server protocol. Unlike ZooKeeper, ClickHouse® Keeper does not require separate hosts for its operation and runs on ClickHouse® hosts. You can enable ClickHouse® Keeper support only when creating a cluster.
Using ClickHouse® Keeper is associated with the following limitations:
- You can only create clusters of three or more hosts.
- ClickHouse® Keeper support cannot be enabled or disabled after creating a cluster.
- You cannot switch clusters using ZooKeeper hosts to ClickHouse® Keeper.
- To migrate a host from ClickHouse® Keeper to a different availability zone, you have to contact support
.
You can learn more about ClickHouse® Keeper in the ClickHouse® documentation
ZooKeeper
ZooKeeper is a coordination tool you can use to distribute queries among ClickHouse® hosts. For successful replication, a Managed Service for ClickHouse® cluster must have three or five ZooKeeper hosts.
If your cluster consists of one ClickHouse® host or several single-host shards and was originally created without ClickHouse® Keeper support, you must enable fault tolerance for the cluster before adding new hosts. To do this, add three or five ZooKeeper hosts to the cluster. If the cluster already has ZooKeeper hosts, you can add ClickHouse® hosts to any shards.
If you are creating a cluster with two or more ClickHouse® hosts per shard, three ZooKeeper hosts will be automatically added to the cluster. At this point, you can only set up their configuration. Mind the following:
-
If a cluster in the virtual network has subnets in each availability zone, a ZooKeeper host is automatically added to each subnet if you do not explicitly specify the settings for such hosts. You can explicitly specify three ZooKeeper hosts and their settings when creating a cluster, if required.
-
If a cluster in the virtual network has subnets only in certain availability zones, you need to explicitly specify three ZooKeeper hosts and their settings when creating a cluster.
-
If you did not specify any subnets for these hosts, Managed Service for ClickHouse® will automatically distribute them among the subnets of the network the ClickHouse® cluster is connected to.
The minimum number of cores per ZooKeeper host depends on the total number of cores on ClickHouse® hosts:
Total number of ClickHouse® host cores | Minimum number of cores per ZooKeeper host |
---|---|
Less than 48 | 2 |
48 or higher | 4 |
You can change ZooKeeper host class and storage size when updating cluster settings. You cannot change ZooKeeper settings or connect to such hosts.
Warning
ZooKeeper hosts, if any, are counted in when calculating resource usage
Replicated tables
ClickHouse® supports automatic replication only for tables on the ReplicatedMergeTree engine
Warning
We recommend creating replicated tables on all cluster hosts. Otherwise, you may lose data when restoring a cluster from a backup or migrating cluster hosts to a different availability zone.
To create a ReplicatedMergeTree
table on a specific ClickHouse® host, run the following query:
CREATE TABLE db_01.table_01 (
log_date date,
user_name String) ENGINE = ReplicatedMergeTree ('/table_01', '{replica}'
)
PARTITION BY log_date
ORDER BY
(log_date, user_name);
Where:
db_01
: Database name.table_01
: Table name./table_01
: Path to the table in ZooKeeper or ClickHouse® Keeper, which must start with a forward slash/
.{replica}
: Host ID macro substitution.
To create replicated tables on all cluster hosts, send a distributed DDL request
CREATE TABLE db_01.table_01 ON CLUSTER '{cluster}' (
log_date date,
user_name String) ENGINE = ReplicatedMergeTree ('/table_01', '{replica}'
)
PARTITION BY log_date
ORDER BY
(log_date, user_name);
The '{cluster}'
argument will be automatically resolved to the ClickHouse® cluster ID.
To learn how to manage the interaction between replicated and distributed tables in a ClickHouse® cluster, see Sharding.
ClickHouse® is a registered trademark of ClickHouse, Inc