Replication in Managed Service for ClickHouse®
With Managed Service for ClickHouse®, you can use one of the following tools to manage replication and query distribution:
- ClickHouse® Keeper.
- ZooKeeper (default).
You can create replicated tables in a cluster with enabled replication. Meanwhile, the replication is managed automatically.
ClickHouse® Keeper
Note
This feature is at the Preview stage. Access to ClickHouse® Keeper is available on request. Contact technical support
ClickHouse® Keeper is a service for data replication and running distributed DDL queries; it implements the ZooKeeper-compatible client-server protocol. Unlike ZooKeeper, ClickHouse® Keeper does not require separate hosts for its operation and runs on ClickHouse® hosts. You can enable ClickHouse® Keeper support only when creating a cluster.
Using ClickHouse® Keeper is associated with the following limitations:
- You can only create clusters of three or more hosts.
- ClickHouse® Keeper support cannot be enabled or disabled after creating a cluster.
- You cannot switch clusters using ZooKeeper hosts to ClickHouse® Keeper.
- To migrate a host from ClickHouse® Keeper to a different availability zone, you have to contact support
.
You can learn more about ClickHouse® Keeper in the ClickHouse® documentation
ZooKeeper
If your cluster consists of one host or several single-host shards and was originally created without ClickHouse® Keeper support, you must enable fault tolerance for the cluster before adding new hosts. In which case three ZooKeeper hosts will be added to the cluster, which is the minimum number required for replication management and fault tolerance.
You can also configure ZooKeeper hosts as soon as you create a multi-host cluster. In which case:
-
If a cluster in the virtual network has subnets in each availability zone, a ZooKeeper host is automatically added to each subnet if you do not explicitly specify the settings for such hosts. You can explicitly specify three ZooKeeper hosts and their settings when creating a cluster, if required.
-
If a cluster in the virtual network has subnets only in certain availability zones, you need to explicitly specify three ZooKeeper hosts and their settings when creating a cluster.
-
If you did not specify any subnets for these hosts, Managed Service for ClickHouse® will automatically distribute them among the subnets of the network the ClickHouse® cluster is connected to.
The minimum number of cores per ZooKeeper host depends on the total number of cores on ClickHouse® hosts:
Total number of ClickHouse® host cores | Minimum number of cores per ZooKeeper host |
---|---|
Less than 48 | 2 |
48 or higher | 4 |
The ZooKeeper host class can be changed when configuring fault tolerance or cluster settings. You cannot change ZooKeeper settings or connect to such hosts.
Warning
ZooKeeper hosts, if any, are counted in when calculating resource usage
Replicated tables
ClickHouse® supports automatic replication only for tables on the ReplicatedMergeTree engine
Warning
We recommend creating replicated tables on all cluster hosts. Otherwise, you may lose data when restoring a cluster from a backup or migrating cluster hosts to a different availability zone.
To create a ReplicatedMergeTree
table on a specific ClickHouse® host, run the following query:
CREATE TABLE db_01.table_01 (
log_date date,
user_name String) ENGINE = ReplicatedMergeTree ('/table_01', '{replica}'
)
PARTITION BY log_date
ORDER BY
(log_date, user_name);
Where:
db_01
: Database name.table_01
: Table name./table_01
: Path to the table in ZooKeeper or ClickHouse® Keeper, which must start with a forward slash/
.{replica}
: Host ID macro substitution.
To create replicated tables on all cluster hosts, send a distributed DDL request
CREATE TABLE db_01.table_01 ON CLUSTER '{cluster}' (
log_date date,
user_name String) ENGINE = ReplicatedMergeTree ('/table_01', '{replica}'
)
PARTITION BY log_date
ORDER BY
(log_date, user_name);
The '{cluster}'
argument will be automatically resolved to the ClickHouse® cluster ID.
To learn how to manage the interaction between replicated and distributed tables in a ClickHouse® cluster, see Sharding.
ClickHouse® is a registered trademark of ClickHouse, Inc