Questions about ClickHouse®
-
Can I deploy a ClickHouse® database cluster in multiple availability zones?
-
Why does a ClickHouse® cluster take up 3 hosts more than it should?
-
Why is the cluster slow even though the computing resources are not used fully?
Why should I use ClickHouse® in Managed Service for ClickHouse® rather than my own VM-based installation?
Managed Service for ClickHouse® automates routine database maintenance:
-
Quick DB deployment with the necessary available resources.
-
Data backup.
-
Regular software updates.
-
Providing DB cluster failover.
-
Database usage monitoring and statistics.
When should I use ClickHouse® instead of PostgreSQL?
ClickHouse® only supports adding and reading data because it is designed primarily for (OLAP). In other cases, it's probably more convenient to use PostgreSQL.
How do I upload data to ClickHouse®?
Use the INSERT
statement described in the ClickHouse® documentation
How do I upload very large data to ClickHouse®?
Use the CLIINSERT
command per second).
Data transfer from physical media is not yet supported.
What happens to a cluster if one of its nodes fails?
DB clusters consist of at least two replicas, so the cluster will continue working if one of its nodes is out.
Data may be lost only if a node with a non-replicated table
Can I deploy a ClickHouse® database cluster in multiple availability zones?
Yes, you can. A database cluster may consist of hosts residing in different availability zones or even regions.
How does replication work for ClickHouse®?
Managed Service for ClickHouse® clusters use replication using ClickHouse® Keeper or ZooKeeper. In the first case, no additional settings are required — replication and fault tolerance are enabled by default. In the second case, for each ClickHouse® cluster, a ZooKeeper cluster with at least three hosts is created.
Access to ZooKeeper and its setup are not available to Yandex Cloud users.
Why does a ClickHouse® cluster take up 3 hosts more than it should?
When creating a ClickHouse® cluster with 2 or more hosts, Managed Service for ClickHouse® automatically creates a cluster with 3 ZooKeeper hosts to manage replication and fault tolerance, if ClickHouse® Keeper support is not enabled. These hosts are taken into account when calculating the consumed cloud resource quota
For more information about using ZooKeeper, see the ClickHouse® documentation
How do I delete data in ClickHouse® based on TTL?
Data is deleted based on TTL
Deleting entire data chunks is more efficient and uses less server resources but requires the value of the TTL expression and the partitioning key
Deletions during merge transactions use more resources and are carried out with regular background merge transactions or during unscheduled merges. Merge frequency depends on the value in the merge_with_ttl_timeout
parameter. This parameter is set at table creation
We recommend managing TTL data processing always to delete obsolete data in entire chunks. To do this, set ttl_only_drop_partstrue
when creating tables.
Can I use JSON data for tables in ClickHouse®?
Yes, you can. However, JSON is currently an experimental data type in ClickHouse®. To allow creating tables of this type, run this query:
SET allow_experimental_object_type=1;
Note
SET
queries are not supported when connecting to a cluster through the management console. To run such a query, use a different cluster connection method, e.g., through clickhouse-client.
Make sure you have the latest client version installed.
For more information, see the ClickHouse® documentation
Why is the cluster slow even though the computing resources are not used fully?
Perhaps, the maximum storage IOPS and bandwidth values are insufficient for processing the current number of requests. In this case, throttling is triggered and the performance of the entire cluster degrades.
The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:
Disk type | Step, GB | Max IOPS increase (read/write) | Max bandwidth increase (read/write), MB/s |
---|---|---|---|
network-hdd |
256 | 300/300 | 30/30 |
network-ssd |
32 | 1,000/1,000 | 15/15 |
network-ssd-nonreplicated |
93 | 28,000/5,600 | 110/82 |
To increase the maximum IOPS and bandwidth values and make throttling less likely, increase the storage size when you update your cluster.
If you are using the network-hdd
storage type, consider switching to network-ssd
or network-ssd-nonreplicated
by restoring the cluster from a backup.
ClickHouse® is a registered trademark of ClickHouse, Inc