FAQ about Managed Service for ClickHouse®
General questions
-
What part of database management and maintenance is Managed Service for ClickHouse® responsible for?
-
Which ClickHouse® version does Managed Service for ClickHouse® use?
-
How can I change the computing resources and storage size for a database cluster?
-
How can I fix the no permission error when connecting a service account to the cluster?
Questions about ClickHouse®
-
Can I deploy a ClickHouse® database cluster in multiple availability zones?
-
Why does a ClickHouse® cluster take up 3 hosts more than it should?
-
Why is the cluster slow even though the computing resources are not used fully?
Connection
-
Why do I get an
UNEXPECTED_PACKET_FROM_SERVERerror when connecting? -
Can I connect to cluster hosts via SSH or get superuser permissions on hosts?
-
What do I do if I get the revocation check error when using PowerShell to obtain an SSL certificate?
Updating a cluster
Cluster parameter settings
-
How do I create a user to access a cluster from DataLens with read-only permissions?
-
How do I grant a user permissions to create and delete tables or databases?
-
Why should a Managed Service for ClickHouse® cluster have three or five ZooKeeper hosts?
Moving and restoring a cluster
-
When are backups performed? Is a database cluster available during a backup?
-
How many backups are stored in Managed Service for ClickHouse®? For how long?
-
Why does it take a long time to restore a cluster from a backup?
-
How do I move an existing ClickHouse® cluster to Yandex Cloud?
-
Can I restore a shard from a backup into a shard in an existing cluster?
Monitoring and logs
General questions
What is Managed Service for ClickHouse®?
Managed Service for ClickHouse® is a solution that helps you create, operate, and scale ClickHouse® databases in the cloud.
With Managed Service for ClickHouse®, you can:
- Create a database with the performance parameters tailored to your needs.
- Scale computing power and dedicated storage capacity for your databases as needed.
- Get database logs.
Managed Service for ClickHouse® takes over time-consuming ClickHouse® infrastructure administration tasks:
- Monitors resource usage.
- Automatically creates DB backups.
- Provides fault tolerance through automatic failover to backup replicas.
- Keeps database software updated.
You work with a Managed Service for ClickHouse® database cluster as if it were a regular database in your local infrastructure This allows you to manage internal database settings to meet your app requirements.
What is ClickHouse® used for? Which DBMS should I select?
ClickHouse® is designed primarily for analytics (OLAP) and only supports adding and reading data. You can update data but with certain limitations
What is Managed Service for ClickHouse®'s share of database management and maintenance work?
When you create clusters, Managed Service for ClickHouse® allocates resources, installs the DBMS, and creates databases.
For all created and running databases, Managed Service for ClickHouse® automatically creates backups and applies fixes and updates.
Managed Service for ClickHouse® also enables data replication between database hosts (both within and across availability zones) and automatically fails over to a backup replica if a failure occurs.
Be mindful of what is what is controlled by the service, and what by the Yandex Cloud customer. Understanding these control zones will help you use your cloud resources effectively and avoid potential database-related problems. For more information, see Zones of control between managed database (MDB) service users and Yandex Cloud.
Not sure whether to use Managed Service for ClickHouse® or VMs running databases?
Yandex Cloud offers two ways to work with databases:
- Managed Service for ClickHouse®: Enables you to operate template databases without needing to manage their administration.
- Yandex Compute Cloud VM: Enables you to create and configure your own databases. With this approach, you can use any database management systems, access databases via SSH, and more.
What is a database host and database cluster?
A database host is an isolated database environment in the cloud with dedicated computing resources and reserved storage capacity.
A database cluster is one or more database hosts with the option to configure replication.
How do I get started with Managed Service for ClickHouse®?
Managed Service for ClickHouse® is available to all registered Yandex Cloud users.
To create a database cluster in Managed Service for ClickHouse®, you need to define its settings:
- Host class (performance parameters, such as CPUs, RAM, etc.).
- Storage size (fully reserved when creating the cluster).
- Network your cluster will be connected to.
- Number of hosts for your cluster and availability zone for each host.
For more information, see Getting started.
How many database hosts does a cluster support?
The minimum number of hosts in a cluster depends on the following:
-
-
At least three hosts for non-replicated SSDs (
network-ssd-nonreplicated). -
At least two hosts for local SSDs (
local-ssd). -
At least one host for the following:
- Network HDDs (
network-hdd). - Network SSDs (
network-ssd). - Ultra high-speed network SSDs with three replicas (
network-ssd-io-m3).
- Network HDDs (
-
-
Cluster sharding. When sharding is enabled, you need to multiply the minimum number of hosts for the selected disk type by the number of shards.
For more information on the features and limitations of sharding in ClickHouse®, see this article.
The maximum number of hosts per cluster cannot exceed the set limits.
For more information, see Quotas and limits.
How can I access a running DB host?
You can connect to Managed Service for ClickHouse® databases using standard DBMS methods.
Learn more about connecting to clusters here.
How many clusters can I create within a single cloud?
For more information on MDB technical and organizational limitations, see Quotas and limits.
How are DB clusters maintained?
In Managed Service for ClickHouse®, maintenance implies:
- Automatic installation of DBMS updates and fixes for DB hosts (including disabled clusters).
- Changes to the host class and storage size.
- Other Managed Service for ClickHouse® maintenance activities.
For more information, see Maintenance.
How do I edit external dictionaries?
To rename a dictionary, run this query:
RENAME DICTIONARY <dictionary_name> TO <new_name>
For other changes, use the update API method.
Which ClickHouse® version does Managed Service for ClickHouse® use?
Managed Service for ClickHouse® uses multiple latest stable versions of ClickHouse®. For more information, see ClickHouse® versioning policy.
Which ClickHouse® version should I choose?
We recommend the latest available LTS version of ClickHouse®. For more information, see ClickHouse® versioning policy.
What happens when a new DBMS version is released?
When new minor versions are released, the cluster software is automatically updated after a short testing period.
Owners of the affected DB clusters are notified of an expected maintenance period and DB availability in advance.
What happens when a DBMS version becomes deprecated?
When a DBMS version becomes deprecated, Managed Service for ClickHouse® automatically sends email notifications to the owners of database clusters created with this version.
New hosts can no longer be created using deprecated DBMS versions. Clusters running a deprecated ClickHouse® version are updated according to the versioning policy.
Owners of the affected DB clusters are notified of an expected maintenance period and DB availability in advance.
How do you calculate usage cost for a database host?
In Managed Service for ClickHouse®, the usage cost is calculated based on the following:
- Selected host class.
- Size of the storage reserved for the database host.
- Size of the database cluster backups. Backup size equal to the storage size is free of charge. Backup storage that exceeds this size is charged based on the pricing policy.
- Database host uptime in hours. Partial hours are rounded up to the nearest whole hour. For the cost per hour of operation for each host class, see Pricing policy.
How much does it cost to use my cluster?
In the management console
How can I change the computing resources and storage size for a database cluster?
You can change computing resources and storage size in the management console. All you need to do is choose a different host class for the relevant cluster.
The cluster settings update within 30 minutes. This period may also include other cluster maintenance activities, such as installing updates.
How can I fix the no permission error when assigning a service account to a cluster?
Error message:
ERROR: rpc error: code = PermissionDenied desc = you do not have permission to access the requested service account or service account does not exist
The error occurs in the following cases:
- You are creating or modifying a cluster and linking it to a service account.
- You are restoring a cluster linked to a service account from its backup.
To fix this error, assign your Yandex Cloud account the iam.serviceAccounts.user role or higher.
Does the service meet the requirements of the Russian Federation Federal Law 152-FZ on personal data?
Yes, it does. You can read the full security audit conclusion here
Can I get logs of my operations in Yandex Cloud?
Yes, you can request information about operations with your resources from Yandex Cloud logs. Do it by contacting support
Questions about ClickHouse®
Why should I use ClickHouse® in Managed Service for ClickHouse® rather than my own VM-based installation?
Managed Service for ClickHouse® automates routine database maintenance:
-
Quick DB deployment with the necessary available resources.
-
Data backup.
-
Regular software updates.
-
Providing DB cluster failover.
-
Database usage monitoring and statistics.
When should I use ClickHouse® instead of PostgreSQL?
ClickHouse® only supports adding and reading data because it is designed primarily for (OLAP). In other cases, it's probably more convenient to use PostgreSQL.
How do I upload data to ClickHouse®?
Use the INSERT statement described in the ClickHouse® documentation
How do I upload very large data to ClickHouse®?
Use the CLIINSERT command per second).
Data transfer from physical media is not yet supported.
What happens to a cluster if one of its nodes fails?
DB clusters consist of at least two replicas, so the cluster will continue working if one of its nodes is out.
Data may be lost only if a node with a non-replicated table
Can I deploy a ClickHouse® database cluster in multiple availability zones?
Yes, you can. A database cluster may consist of hosts residing in different availability zones or even regions.
How does replication work for ClickHouse®?
Managed Service for ClickHouse® clusters use replication using ClickHouse® Keeper or ZooKeeper. In the first case, no additional settings are required — replication and fault tolerance are enabled by default. In the second case, for each ClickHouse® cluster, a ZooKeeper cluster with at least three hosts is created.
Access to ZooKeeper and its setup are not available to Yandex Cloud users.
Why does a ClickHouse® cluster take up 3 hosts more than it should?
When creating a ClickHouse® cluster with 2 or more hosts, Managed Service for ClickHouse® automatically creates a cluster with 3 ZooKeeper hosts to manage replication and fault tolerance, if ClickHouse® Keeper support is not enabled. These hosts are taken into account when calculating the consumed cloud resource quota
For more information about using ZooKeeper, see the ClickHouse® documentation
How do I delete data in ClickHouse® based on TTL?
Data is deleted based on TTL
Deleting entire data chunks is more efficient and uses less server resources but requires the value of the TTL expression and the partitioning key
Deletions during merge transactions use more resources and are carried out with regular background merge transactions or during unscheduled merges. Merge frequency depends on the value in the merge_with_ttl_timeout parameter. This parameter is set at table creation
We recommend managing TTL data processing always to delete obsolete data in entire chunks. To do this, set ttl_only_drop_partstrue when creating tables.
Can I use JSON data for tables in ClickHouse®?
Yes, you can. However, JSON is currently an experimental data type in ClickHouse®. To allow creating tables of this type, run this query:
SET allow_experimental_object_type=1;
Note
SET queries are not supported when connecting to a cluster through the management console. To run such a query, use a different cluster connection method, e.g., through clickhouse-client.
Make sure you have the latest client version installed.
For more information, see the ClickHouse® documentation
Why is the cluster slow even though the computing resources are not used fully?
Your storage may have insufficient maximum IOPS and bandwidth to process the current number of requests. In this case, throttling occurs, which degrades the entire cluster performance.
The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:
| Disk type | Step, GB | Max IOPS increase (read/write) | Max bandwidth increase (read/write), MB/s |
|---|---|---|---|
network-hdd |
256 | 300/300 | 30/30 |
network-ssd |
32 | 1,000/1,000 | 15/15 |
network-ssd-nonreplicated, network-ssd-io-m3 |
93 | 28,000/5,600 | 110/82 |
To increase the maximum IOPS and bandwidth values and make throttling less likely, increase the storage size when you update your cluster.
If you are using the network-hdd storage type, consider switching to network-ssd or network-ssd-nonreplicated by restoring the cluster from a backup.
Connection
Can I connect to individual ClickHouse® hosts?
Yes. You can connect to ClickHouse® cluster hosts:
-
Using the HTTPS interface
:- Via an encrypted SSL connection on port 8443.
- Without encryption through port 8123.
-
Using the command-line client
:- Via an encrypted SSL connection on port 9440.
- Without encryption through port 9000.
SSH connections are not supported.
Why cannot I connect to a host from the internet?
Most likely, no public access is enabled for the cluster, so you can only connect to it from a VM in Yandex Cloud. You can only request public access when creating a new host in your cluster.
How do I connect to a non-public host in Yandex Cloud?
Connect to a host from a VM in Yandex Cloud hosted in the same cloud network, or add a new cluster host with public access and connect to a non-public host through it.
Can I connect to a public cluster without SSL?
No. You can only connect to public hosts using an SSL connection. For more information, see the documentation.
Why do I get an UNEXPECTED_PACKET_FROM_SERVER error when connecting?
Here is the full text of the error:
Code: 102. DB::NetException:
Unexpected packet from server <host_FQDN>.mdb.yandexcloud.net:9440
(expected Hello or Exception, got Unknown packet)
This error occurs when you try to connect directly to the ClickHouse® host through port 9440 without using encryption. You can only connect through port 9440 over an encrypted SSL connection.
Make sure to specify the --secure parameter when connecting through port 9440.
To learn more about connection methods, see Connecting to a ClickHouse® cluster.
Can I connect to cluster hosts via SSH or get superuser permissions on hosts?
You cannot connect to hosts via SSH. This is done for the sake of security and user cluster fault tolerance because direct changes inside a host can render it completely inoperable.
What should I do if I get the revocation check error when using PowerShell to obtain an SSL certificate?
Here is the full text of the error:
curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012)
The revocation function was unable to check revocation for the certificate
This means, when connecting to the website, the service was unable to check whether or not its certificate was listed among revoked ones.
To fix this error:
-
Make sure the corporate network settings do not block the check.
-
Run the command with the
--ssl-no-revokeparameter.mkdir -Force $HOME\.yandex; ` curl.exe https://storage.yandexcloud.net/cloud-certs/RootCA.pem ` --ssl-no-revoke ` --output $HOME\.yandex\RootCA.crt; ` curl.exe https://storage.yandexcloud.net/cloud-certs/IntermediateCA.pem ` --ssl-no-revoke ` --output $HOME\.yandex\IntermediateCA.crt; ` Import-Certificate ` -FilePath $HOME\.yandex\RootCA.crt ` -CertStoreLocation cert:\CurrentUser\Root; ` Import-Certificate ` -FilePath $HOME\.yandex\IntermediateCA.crt ` -CertStoreLocation cert:\CurrentUser\Root
Updating a cluster
How do I add a host to a cluster?
To add a host, follow this guide. You can also add new hosts to a cluster when creating a shard.
Can I set join_use_nulls to 1 using the CLI?
Yes. To do this, when creating a user or updating user settings, provide the required join_use_nulls value in --settings. Here is an example:
yc managed-clickhouse user update <username> \
--cluster-name=<cluster_name> \
--settings="join_use_nulls=1"
For more information, see this guide.
Will my cluster be unavailable during an update?
If it your cluster has more than one host, there is no downtime while updating it, since the hosts are updated one by one. Only individual hosts are unavailable when the cluster is being restarted.
How do I change the time zone?
Change the ClickHouse® timezone
Will my cluster be unavailable when adding replicas?
Yes, the cluster will experience a short downtime during restart.
How do I grant read-only permissions to a user?
To do this, when creating or editing a user via the CLI, specify readonly=1 in --settings. Here is an example:
yc managed-clickhouse user update <username> \
--cluster-name=<cluster_name> \
--settings="readonly=1"
For more information, see this guide.
How do I increase the memory limit?
Update the user settings and set the required Max memory usage value.
Can I change a network and subnets?
No, you can only select a network and subnets for hosts when creating a cluster or restoring it from a backup.
How do I change the distribution of data across shards in a cluster?
In an existing cluster, you cannot change the location of data in shards.
Cluster parameter settings
How do I create a user to access a cluster from DataLens with read-only permissions?
Follow the guide to create a user with read-only permissions. If the cluster settings have the DataLens access option enabled, the service can connect to the cluster through this user.
How do I grant a user permissions to create and delete tables or databases?
Go to the cluster settings, enable the User management via SQL option, and grant the user the appropriate permissions using a GRANT statement
How do I find out the internal_replication setting value?
The internal_replication setting information is not available in Yandex Cloud interfaces or ClickHouse® system tables. The default setting value is true.
How do I increase the maximum amount of RAM to run a query?
If the amount of RAM is not sufficient for running a query, the following error occurs:
DB::Exception: Memory limit (total) exceeded:
would use 14.10 GiB (attempt to allocate chunk of 4219924 bytes), maximum: 14.10 GiB.
(MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below)
To increase the maximum amount of RAM, use the Max memory usage parameter.
If user management via SQL is enabled for the cluster, you can set the Max memory usage parameter as follows:
-
For the current user session by running this query:
SET max_memory_usage = <value_in_bytes>; -
For all default users by creating a settings profile
.
Why must a Managed Service for ClickHouse® cluster have three or five ZooKeeper hosts?
ZooKeeper uses the consensus algorithm: it keeps on running as long as most ZooKeeper hosts are healthy.
For example, if a cluster has two ZooKeeper hosts, then, should one of them stop, the remaining host will not form the majority, so the service will become unavailable. Thus, a cluster with two ZooKeeper hosts has no fault tolerance.
A cluster with three ZooKeeper hosts, on the other hand, is fault-tolerant. When one of its hosts is down or under maintenance, the cluster remains operational. Therefore, three is the minimum recommended number of ZooKeeper hosts per a Managed Service for ClickHouse® cluster.
A cluster with four ZooKeeper hosts has no advantages over a three-host cluster: it is going to be just as operational if only one of its hosts fails. With two hosts down, the consensus is not met, so the service becomes unavailable.
A cluster with five ZooKeeper hosts is resilient enough to keep going without two of its hosts, three hosts out of five still forming the majority. This is why it is going to be more serviceable that a three-host cluster. Even if one host out of five is under maintenance or restarting, the cluster remains fault-tolerant, i.e., it can lose one more host and still be operational.
Adding more than five ZooKeeper hosts to a cluster is not normally advisable. The more ZooKeeper hosts there are, the longer their interaction times, slowing the service down.
Therefore, we recommend creating three or five ZooKeeper hosts per a Managed Service for ClickHouse® cluster.
Moving and restoring a cluster
What is the backup procedure for a ClickHouse® database?
Backups are created every 24 hours and stored for seven days after being created. You can restore data only as of the backup creation time.
Is database host backup enabled by default?
Yes, backup is enabled by default. For ClickHouse®, a full backup is performed once a day with the possibility to restore it to any saved backup.
When are backups performed? Is a database cluster available during backup?
When creating or updating a cluster, you can set the time interval during which the backup will start. Default time: 22:00 - 23:00 UTC (Coordinated Universal Time).
Clusters remain fully accessible during the backup window.
How many backups are stored in Managed Service for ClickHouse®? For how long?
The size and amount of backups are not limited. Automatically created backups are stored for seven days, while manually created ones are stored indefinitely.
Can I delete a backup?
Yes, if you created it manually. To delete a backup, follow this guide.
Can I change the automatic backup retention period?
You can set the retention period for automatic backups when creating or modifying a cluster.
What does a daily backup include?
Backup data is only stored for the MergeTree engine family. For other engines, backups only store table schemas. For more information, see Backups.
Why does it take a long time to restore a cluster from a backup?
The approximate average speed of restoring a cluster from a backup is 100 Mbps. The recovery time may vary significantly depending on host class and the nature of data in the DB.
How do I move an existing ClickHouse® cluster to Yandex Cloud?
Use Yandex Data Transfer.
Can I restore a shard from a backup into a shard in an existing cluster?
This option is not currently supported.
However, you can restore your shard from a backup to a new ClickHouse® cluster and transfer data from that cluster into an existing one. To do this, explore the below options:
- Data Transfer. You can use this service to transfer a database or separate tables to a Managed Service for ClickHouse® cluster.
- ClickHouse®'s built-in
remotefunction. Use it to transfer separate tables to a Managed Service for ClickHouse® cluster. - ClickHouse®
BACKUPandRESTOREcommands. These will help you to back up a database or single table to a Yandex Object Storage bucket and then restore your data from the bucket to a Managed Service for ClickHouse® cluster.
Monitoring and logs
What metrics and processes can be tracked using monitoring?
For all DBMS types, you can track:
- CPU, memory, network, or disk usage, in absolute terms.
- Memory, network, or disk usage as a percentage of the set limits for the corresponding cluster host class.
- The amount of data in the DB cluster and the remaining free space in data storage.
For DB hosts, you can track metrics specific to the corresponding type of DBMS. For example, for PostgreSQL, you can track:
- Average query execution time.
- Number of requests per second.
- Number of errors in logs.
Monitoring can be performed with a minimum granularity of 5 seconds.
How is log storage charged?
Logs of any level are written to a disk's system partition with 20 GB allocated, so you are not charged for them separately. The size of the logs created only affects log rotation frequency.
What is the retention period for logs?
Cluster logs are stored for 30 days.
How do I track the amount of free storage space on ZooKeeper hosts?
Follow the steps in Monitoring the state of clusters and hosts to track the host state or set up alerts.
How do I monitor space used by data in hybrid storage?
Use the ch_s3_disk_parts_size metric in Yandex Monitoring. It shows the amount of space used by MergeTree
How do I set up an alert that triggers as soon as a certain percentage of disk space has been used up?
Create an alert with the disk.used_bytes metric in Yandex Monitoring. This metric shows the disk space usage in the Managed Service for ClickHouse® cluster.
For disk.used_bytes, use notification thresholds. The recommended values are as follows:
Alarm: 95% of the disk spaceWarning: 80% of the disk space
Thresholds are set in bytes only. For example, the recommended values for a 100 GB disk are as follows:
Alarm:102,005,473,280bytes (95%)Warning:85,899,345,920bytes (80%)
ClickHouse® is a registered trademark of ClickHouse, Inc