Replication in Managed Service for ClickHouse®
With Managed Service for ClickHouse®, you can use one of the following tools to manage replication and query distribution:
- ClickHouse® Keeper
- ZooKeeper (default)
This allows you to use replicated tables in a cluster with multiple hosts in a shard. Meanwhile, the replication is managed automatically.
ClickHouse® Keeper
Note
This feature is at the Preview stage. Access to ClickHouse® Keeper is available on request. Contact technical support
ClickHouse® Keeper is a service for data replication and running distributed DDL queries; it implements the ZooKeeper-compatible client-server protocol. Unlike ZooKeeper, ClickHouse® Keeper does not require separate hosts for its operation and runs on ClickHouse® hosts. You can enable ClickHouse® Keeper support only when creating a cluster.
Using ClickHouse® Keeper is associated with the following limitations:
- You can only create clusters of three or more hosts.
- ClickHouse® Keeper support cannot be enabled or disabled after creating a cluster.
- You cannot switch clusters using ZooKeeper hosts to ClickHouse® Keeper.
- To migrate a host from ClickHouse® Keeper to a different availability zone, you have to contact support
.
For more information about ClickHouse® Keeper, see the ClickHouse® documentation
ZooKeeper
If the cluster was created without ClickHouse® Keeper support, before adding new hosts to a single-host shard, you need to enable fault tolerance for the cluster, if it is not already enabled. In this case, three ZooKeeper hosts, which is the minimum number of hosts required to manage replication and fault tolerance, will be added to the cluster.
You can enable fault tolerance and configure ZooKeeper hosts after creating a cluster with a single host.
You can also configure ZooKeeper hosts immediately when creating a cluster with multiple hosts. In this case:
-
If a cluster in the virtual network has subnets in each availability zone, a ZooKeeper host is automatically added to each subnet if you do not explicitly specify the settings for such hosts. You can explicitly specify three ZooKeeper hosts and their settings when creating a cluster, if required.
-
If a cluster in the virtual network has subnets only in certain availability zones, you need to explicitly specify three ZooKeeper hosts and their settings when creating a cluster.
-
If you did not specify any subnets for these hosts, Managed Service for ClickHouse® will automatically distribute them among the subnets of the network the ClickHouse® cluster is connected to.
The minimum number of cores per ZooKeeper host depends on the total number of cores on ClickHouse® hosts:
Total number of ClickHouse® host cores | Minimum number of cores per ZooKeeper host |
---|---|
Less than 48 | 2 |
48 or higher | 4 |
The ZooKeeper host class can be changed when configuring fault tolerance or cluster settings. You cannot change ZooKeeper settings or connect to such hosts.
Warning
ZooKeeper hosts, if any, are taken into account when calculating resource usage
Replicated tables
ClickHouse® only supports automatic replication for tables running on the ReplicatedMergeTree
Warning
We recommend creating replicated tables on all cluster hosts. Otherwise, you may lose data when restoring a cluster from a backup or migrating cluster hosts to a different availability zone.
To create a ReplicatedMergeTree
table on a specific ClickHouse® host, run the following query:
CREATE TABLE db_01.table_01 (
log_date date,
user_name String) ENGINE = ReplicatedMergeTree ('/table_01', '{replica}'
)
PARTITION BY log_date
ORDER BY
(log_date, user_name);
Where:
db_01
: Database name.table_01
: Table name./table_01
: Path to the table in ZooKeeper or ClickHouse® Keeper, which must start with a forward slash (/
).{replica}
: Host ID macro.
To create replicated tables on all cluster hosts, run the following distributed DDL query
CREATE TABLE db_01.table_01 ON CLUSTER '{cluster}' (
log_date date,
user_name String) ENGINE = ReplicatedMergeTree ('/table_01', '{replica}'
)
PARTITION BY log_date
ORDER BY
(log_date, user_name);
The '{cluster}'
argument will be automatically resolved to the ClickHouse® cluster ID.
To learn how to manage the interaction between replicated and distributed tables in a ClickHouse® cluster, see Sharding.
ClickHouse® is a registered trademark of ClickHouse, Inc