Questions about ClickHouse®
-
Can I deploy a ClickHouse® database cluster in multiple availability zones?
-
Why is my cluster slow even though the computing resources are not fully utilized?
Why should I use ClickHouse® in Managed Service for ClickHouse® rather than my own VM-based installation?
Managed Service for ClickHouse® automates routine database maintenance:
-
Quick DB deployment with the required available resources.
-
Data backup.
-
Regular software updates.
-
Ensuring DB cluster fault tolerance.
-
Database usage monitoring and statistics.
When should I use ClickHouse® rather than PostgreSQL?
ClickHouse® only supports adding and reading data since it is primarily designed for analytics (OLAP). For other purposes, you might want to use PostgreSQL.
How do I load data into ClickHouse®?
Use the INSERT statement described in this ClickHouse® article
How do I load a large data volume into ClickHouse®?
Use the CLIINSERT command per second).
Currently, data transfer from physical media is not supported.
What will happen to my cluster if one of its nodes fails?
DB clusters consist of at least two replicas, so if one node is down, the cluster will keep running.
You may lose data only if a node with a non-replicated table
Can I deploy a ClickHouse® database cluster in multiple availability zones?
Yes. A database cluster may consist of hosts residing in different availability zones or even regions.
How does replication work for ClickHouse®?
Managed Service for ClickHouse® clusters use ClickHouse® Keeper or ZooKeeper for replication. In the first case, replication and fault tolerance are enabled by default, so no further configuration is needed. In the second case, each ClickHouse® cluster comes with a ZooKeeper cluster containing at least three hosts.
Yandex Cloud users do not have access to ZooKeeper and cannot configure it.
Why does my ClickHouse® cluster use three extra hosts?
When creating a ClickHouse® cluster of two or more hosts, Managed Service for ClickHouse® automatically creates a cluster of three ZooKeeper hosts to manage replication and fault tolerance, unless ClickHouse® Keeper support is enabled. These hosts are counted towards both the cloud resource quota
For more information about using ZooKeeper, see this ClickHouse® article
How does ClickHouse® handles data deletion based on TTL?
With TTL
Deleting entire data parts is more efficient and uses less server resources, but requires the TTL expression and partitioning key
Deleting data during merges uses more resources and takes place either along with regular background merges or during unscheduled merges. Merge frequency is defined by the merge_with_ttl_timeout parameter, which indicates the minimum time in seconds before a repeat merge to process data with expired TTL. You set this parameter when creating
We recommend managing data with TTL so that old data is always deleted in data parts. To do this, set ttl_only_drop_partstrue when creating tables.
Can I use JSON data for tables in ClickHouse®?
Yes, you can. However, JSON is currently an experimental data type in ClickHouse®. To allow creating tables of this type, run this query:
SET allow_experimental_object_type=1;
Note
The SET queries are not supported when connecting to a cluster via the management console. To run such a query, use a different cluster connection method, e.g., via clickhouse-client.
Make sure you have the latest client version installed.
For more information, see this ClickHouse® article
Why is my cluster slow even though the computing resources are not fully utilized?
Your storage may have insufficient maximum IOPS and bandwidth to process the current number of requests. In this case, throttling occurs, which degrades the entire cluster performance.
The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:
| Disk type | Step, GB | Max IOPS increase (read/write) | Max bandwidth increase (read/write), MB/s |
|---|---|---|---|
network-hdd |
256 | 300/300 | 30/30 |
network-ssd |
32 | 1,000/1,000 | 15/15 |
network-ssd-nonreplicated, network-ssd-io-m3 |
93 | 28,000/5,600 | 110/82 |
To increase the maximum IOPS and bandwidth values and make throttling less likely, expand the storage when updating your cluster.
If you are using the network-hdd storage, consider switching to network-ssd or network-ssd-nonreplicated by restoring the cluster from a backup.
ClickHouse® is a registered trademark of ClickHouse, Inc