FAQ about Managed Service for ClickHouse®
General questions
-
What part of database management and maintenance is Managed Service for ClickHouse® responsible for?
-
Which ClickHouse® version does Managed Service for ClickHouse® use?
-
How can I change the computing resources and storage size for a database cluster?
Questions about ClickHouse®
-
Can I deploy a ClickHouse® database cluster in multiple availability zones?
-
Why does a ClickHouse® cluster take up 3 hosts more than it should?
-
Why is the cluster slow even though the computing resources are not used fully?
Connection
Updating a cluster
Cluster configuration
-
How do I create a user to access a cluster from DataLens with read-only permissions?
-
How do I grant a user permissions to create and delete tables or databases?
Moving and restoring a cluster
-
When are backups performed? Is a database cluster available during backup?
-
How many backups are stored in Managed Service for ClickHouse®? For how long?
-
Why does it take a long time to restore a cluster from a backup?
-
How do I move an existing ClickHouse® cluster to Yandex Cloud?
Monitoring and logs
General questions
What is Managed Service for ClickHouse®?
Managed Service for ClickHouse® is a service that helps you create, operate, and scale ClickHouse® databases in a cloud infrastructure.
With Managed Service for ClickHouse®, you can:
- Create a database with the required performance characteristics.
- Scale processing power and storage dedicated for your databases as needed.
- Get database logs.
Managed Service for ClickHouse® takes on time-consuming ClickHouse® infrastructure administration tasks:
- Monitors resource usage.
- Automatically creates DB backups.
- Provides fault tolerance through automatic failover to backup replicas.
- Keeps database software updated.
You interact with database clusters in Managed Service for ClickHouse® the same way you interact with regular databases in your local infrastructure. This allows you to manage internal database settings to meet your app requirements.
What is ClickHouse® used for? Which database should I select?
ClickHouse® is designed primarily for analytics (OLAP) and only supports adding and reading data. You can update data but with limitations
What part of database management and maintenance is Managed Service for ClickHouse® responsible for?
When creating clusters, Managed Service for ClickHouse® allocates resources, installs the DBMS, and creates databases.
For the created and running databases, Managed Service for ClickHouse® automatically creates backups and applies fixes and updates to the DBMS.
Managed Service for ClickHouse® also allows you to replicate data between database hosts (both within and across availability zones) and automatically routes the load to a backup replica in the event of a failure.
Which tasks are best addressed using Managed Service for ClickHouse®, and which using VMs with databases?
Yandex Cloud offers two ways to work with databases:
- Managed Service for ClickHouse® allows you to operate template databases with no need to worry about administration.
- Yandex Compute Cloud virtual machines allow you to create and configure your own databases. This approach allows you to use any database management systems, access databases via SSH, etc.
What is a database host and database cluster?
A database host is an isolated database environment in the cloud infrastructure with dedicated computing resources and reserved data storage.
A database cluster is one or more database hosts between which replication can be configured.
How do I get started with Managed Service for ClickHouse®?
Managed Service for ClickHouse® is available to any registered Yandex Cloud user.
To create a database cluster in Managed Service for ClickHouse®, you must define its characteristics:
- Host class (performance characteristics, such as CPUs, RAM, etc.).
- Storage size (reserved to the full extent when you create a cluster).
- Network your cluster will be connected to.
- Number of hosts for the cluster and the availability zone for each host.
For more information, see Getting started.
How many DB hosts can a cluster contain?
The minimum number of hosts depends on the selected type of storage:
- If you use non-replicated SSD storage, the minimum number of hosts is 3.
- If you use local SSD storage, the minimum number of hosts is 2.
- If using network HDD or network SSD storage, you can create single-host clusters.
The maximum number of hosts in a cluster is only limited by the requested computing resources and the size of the storage for the cluster.
For more information, see Quotas and limits.
How can I access a running DB host?
You can connect to Managed Service for ClickHouse® databases using standard DBMS methods.
Learn more about connecting to clusters.
How many clusters can I create within a single cloud?
For more information on MDB technical and organizational limitations, see Quotas and limits.
How are DB clusters maintained?
Maintenance in Managed Service for ClickHouse® implies:
- Automatic installation of DBMS updates and revisions for DB hosts (including disabled clusters).
- Changes to the host class and storage size.
- Other Managed Service for ClickHouse® maintenance activities.
For more information, see Maintenance.
How do I edit external dictionaries?
To rename a dictionary, run the query:
RENAME DICTIONARY <dictionary_name> TO <new_name>
For other changes, use the update API method.
Which ClickHouse® version does Managed Service for ClickHouse® use?
Managed Service for ClickHouse® uses some of the latest stable versions of ClickHouse®. For more information, see ClickHouse® versioning policy.
Which ClickHouse® version should I choose?
We recommend the latest available LTS version of ClickHouse®. For more information, see ClickHouse® versioning policy.
What happens when a new DBMS version is released?
When new minor versions are released, the cluster software is automatically updated after a short testing period.
Owners of the affected DB clusters are notified in advance about the maintenance work schedule and DB availability.
What happens when a DBMS version becomes deprecated?
When a DBMS version becomes deprecated, Managed Service for ClickHouse® automatically sends email notifications to the owners of database clusters created with this version.
New hosts can no longer be created using deprecated DBMS versions. Clusters on a deprecated version of ClickHouse® are updated according to the versioning policy.
Owners of the affected DB clusters are notified in advance about the maintenance work schedule and DB availability.
How do you calculate usage cost for a database host?
In Managed Service for ClickHouse®, the usage cost is calculated based on the following parameters:
- Selected host class.
- Size of the storage reserved for the database host.
- Size of the database cluster backups. Backup space in the amount of the reserved storage is free of charge. Backup storage that exceeds this size is charged at special rates.
- Number of hours of database host operation. Partial hours are rounded to an integer value. You can find the cost per hour of operation for each host class in Pricing policy.
How much does it cost to use my cluster?
In the management console
How can I change the computing resources and storage size for a database cluster?
You can change computing resources and storage size in the management console. All you need to do is choose a different host class for the required cluster.
The cluster characteristics change within 30 minutes. During this period, other maintenance activities may also be enabled for the cluster, such as installing updates.
on Personal Data
?
Does the service meet the requirements under Russian Federation Federal Law No. 152-FZ Yes, it does. You can read the full security audit conclusion
Can I get logs of my operations with services?
Yes, you can request log records about your resources from Yandex Cloud services. For more information, see Data requests.
Questions about ClickHouse®
Why should I use ClickHouse® in Managed Service for ClickHouse® rather than my own VM-based installation?
Managed Service for ClickHouse® automates routine database maintenance:
-
Quick DB deployment with the necessary available resources.
-
Data backup.
-
Regular software updates.
-
Providing DB cluster failover.
-
Database usage monitoring and statistics.
When should I use ClickHouse® instead of PostgreSQL?
ClickHouse® only supports adding and reading data because it is designed primarily for (OLAP). In other cases, it's probably more convenient to use PostgreSQL.
How do I upload data to ClickHouse®?
Use the INSERT
statement described in the ClickHouse® documentation
How do I upload very large data to ClickHouse®?
Use the CLIINSERT
command per second).
Data transfer from physical media is not yet supported.
What happens to a cluster if one of its nodes fails?
DB clusters consist of at least two replicas, so the cluster will continue working if one of its nodes is out.
Data may be lost only if a node with a non-replicated table
Can I deploy a ClickHouse® database cluster in multiple availability zones?
Yes, you can. A database cluster may consist of hosts residing in different availability zones or even regions.
How does replication work for ClickHouse®?
Managed Service for ClickHouse® clusters use replication using ClickHouse® Keeper or ZooKeeper. In the first case, no additional settings are required — replication and fault tolerance are enabled by default. In the second case, for each ClickHouse® cluster, a ZooKeeper cluster with at least three hosts is created.
Access to ZooKeeper and its setup are not available to Yandex Cloud users.
Why does a ClickHouse® cluster take up 3 hosts more than it should?
When creating a ClickHouse® cluster with 2 or more hosts, Managed Service for ClickHouse® automatically creates a cluster with 3 ZooKeeper hosts to manage replication and fault tolerance, if ClickHouse® Keeper support is not enabled. These hosts are taken into account when calculating the consumed cloud resource quota
For more information about using ZooKeeper, see the ClickHouse® documentation
How do I delete data in ClickHouse® based on TTL?
Data is deleted based on TTL
Deleting entire data chunks is more efficient and uses less server resources but requires the value of the TTL expression and the partitioning key
Deletions during merge transactions use more resources and are carried out with regular background merge transactions or during unscheduled merges. Merge frequency depends on the value in the merge_with_ttl_timeout
parameter. This parameter is set at table creation
We recommend managing TTL data processing always to delete obsolete data in entire chunks. To do this, set ttl_only_drop_partstrue
when creating tables.
Can I use JSON data for tables in ClickHouse®?
Yes, you can. However, JSON is currently an experimental data type in ClickHouse®. To allow creating tables of this type, run this query:
SET allow_experimental_object_type=1;
Note
SET
queries are not supported when connecting to a cluster through the management console. To run such a query, use a different cluster connection method, e.g., through clickhouse-client.
Make sure you have the latest client version installed.
For more information, see the ClickHouse® documentation
Why is the cluster slow even though the computing resources are not used fully?
Perhaps, the maximum storage IOPS and bandwidth values are insufficient for processing the current number of requests. In this case, throttling is triggered and the performance of the entire cluster degrades.
The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:
Disk type | Step, GB | Max IOPS increase (read/write) | Max bandwidth increase (read/write), MB/s |
---|---|---|---|
network-hdd |
256 | 300/300 | 30/30 |
network-ssd |
32 | 1,000/1,000 | 15/15 |
network-ssd-nonreplicated |
93 | 28,000/5,600 | 110/82 |
To increase the maximum IOPS and bandwidth values and make throttling less likely, increase the storage size when you update your cluster.
If you are using the network-hdd
storage type, consider switching to network-ssd
or network-ssd-nonreplicated
by restoring the cluster from a backup.
Connection
Can I connect to individual ClickHouse® hosts?
Yes. You can connect to ClickHouse® cluster hosts:
-
Using the HTTPS interface
:- Via an encrypted SSL connection on port 8443.
- Without encryption through port 8123.
-
Using the command-line client
:- Via an encrypted SSL connection on port 9440.
- Without encryption through port 9000.
SSH connections are not supported.
Why cannot I connect to a host from the internet?
Most likely, no public access is enabled for the cluster, so you can only connect to it from a VM in Yandex Cloud. You can only request public access when creating a new host in your cluster.
How do I connect to a non-public host in Yandex Cloud?
Connect to a host from a VM in Yandex Cloud hosted in the same cloud network, or add a new cluster host with public access and connect to a non-public host through it.
Can I connect to a public cluster without SSL?
No. You can only connect to public hosts using an SSL connection. For more information, see the documentation.
Can I connect to cluster hosts via SSH or get superuser permissions on hosts?
You cannot connect to hosts via SSH, nor can you get superuser permissions. This is done for the sake of security and user cluster fault tolerance because direct changes inside a host can render it completely inoperable.
Updating a cluster
How do I add a host to a cluster?
To add a host, follow this guide. You can also add new hosts to a cluster when creating a shard.
Can I set join_use_nulls to 1 using the CLI?
Yes. To do this, when creating a user or updating user settings, pass the desired join_use_nulls
setting value in the --settings
parameter. For example:
yc managed-clickhouse user update <username> \
--cluster-name=<cluster_name> \
--settings="join_use_nulls=1"
For more information, see the documentation.
Is a cluster available when being updated?
If it is a multi-host cluster, there is no downtime while updating it, since the hosts are updated one by one. Only individual hosts are unavailable when the cluster is being restarted.
How do I change the time zone?
Change the timezone
Is a cluster unavailable when adding replicas?
Yes, there is a short downtime when restarting a cluster.
How do I grant a user read-only permissions?
To do this, when creating or editing a user via the CLI, pass readonly=1
in the --settings
parameter. For example:
yc managed-clickhouse user update <username> \
--cluster-name=<cluster_name> \
--settings="readonly=1"
For more information, see the documentation.
How do I increase the memory limit?
Update the user settings and set the desired Max memory usage
parameter value.
Can I change the disk type?
No, you can only select the disk type when creating a cluster or restoring it from a backup.
Can I change a network and subnets?
No, you can only select a network and subnets for hosts when creating a cluster or restoring it from a backup.
How to change the distribution of data across shards in a cluster?
In an existing cluster, you cannot change the location of data in shards. To make the change, transfer data to a new cluster with shard redistribution using Yandex Data Transfer.
Cluster configuration
How do I create a user to access a cluster from DataLens with read-only permissions?
Follow the guide to create a user with read-only permissions. If the cluster settings have the DataLens access option enabled, the service can connect to the cluster through this user.
How do I grant a user permissions to create and delete tables or databases?
Go to the cluster settings, enable the Managing users via SQL option, and grant a user the appropriate permissions using a statementGRANT
.
How do I find out the internal_replication setting value?
The internal_replication
setting information is not available in the Yandex Cloud interfaces or the ClickHouse® system tables. The default setting value is true
.
How do I increase the maximum amount of RAM to run a query?
If the amount of RAM is not sufficient for running a query, the following error occurs:
DB::Exception: Memory limit (total) exceeded:
would use 14.10 GiB (attempt to allocate chunk of 4219924 bytes), maximum: 14.10 GiB.
(MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below)
To increase the maximum amount of RAM, use the Max memory usage parameter.
If the User management via SQL option is enabled for the cluster, you can set the Max memory usage
parameter:
-
For the current user session by running this query:
SET max_memory_usage = <value_in_bytes>;
-
For all users by default by creating a settings profile
.
Moving and restoring a cluster
How do I back up a ClickHouse® database?
Backups are created every 24 hours and stored for seven days after being created. You can restore data only as of backup creation time.
Is DB host backup enabled by default?
Yes, backup is enabled by default. For ClickHouse®, a full backup is performed once a day with the possibility to restore it to any saved backup.
When are backups performed? Is a DB cluster available during backup?
When creating or updating a cluster, you can set the time interval during which the backup will start. The default time is 22:00 - 23:00
UTC (Coordinated Universal Time).
Clusters remain fully accessible during the backup window.
How many backups are stored in Managed Service for ClickHouse®? For how long?
The size and amount of backups are not limited. Automatically created backups are stored for seven days, while manually created ones are stored indefinitely.
What does a daily backup include?
Backup data is only stored for the MergeTree
engine family. For other engines, backups only store table schemas. For more information, see Backups.
Why does it take a long time to restore a cluster from a backup?
The average speed when recovering a cluster from a backup is about 100 Mbps. Cluster recovery time may vary significantly depending on the host class and the nature of data being stored in the DB.
How do I move an existing ClickHouse® cluster to Yandex Cloud?
Use Yandex Data Transfer.
Monitoring and logs
What metrics and processes can be tracked using monitoring?
For all DBMS types, you can track:
- CPU, memory, network, or disk usage, in absolute terms.
- Memory, network, or disk usage as a percentage of the set limits for the corresponding cluster host class.
- The amount of data in the DB cluster and the remaining free space in data storage.
For DB hosts, you can track metrics specific to the corresponding type of DBMS. For example, for PostgreSQL, you can track:
- Average query execution time.
- Number of requests per second.
- Number of errors in logs.
Monitoring can be performed with a minimum granularity of 5 seconds.
How is log storage charged?
Logs of any level are written to a disk's system partition with 20 GB allocated, so you are not charged for them separately. The size of the logs created only affects log rotation frequency.
What is the retention period for logs?
Cluster logs are stored for 30 days.
How do I track the amount of free storage space on ZooKeeper hosts?
Follow the steps in Monitoring the state of clusters and hosts to track the host state or set up alerts.
How do I monitor space used by data in hybrid storage?
Use the ch_s3_disk_parts_size
metric in Yandex Monitoring. It shows the amount of space used by MergeTree
How do I set up an alert that triggers as soon as a certain percentage of disk space has been used up?
Create an alert with the disk.used_bytes
metric in Yandex Monitoring. This metric shows the disk space usage in the Managed Service for ClickHouse® cluster.
For disk.used_bytes
, use notification thresholds. The recommended values are as follows:
Alarm
: 95% of the disk spaceWarning
: 80% of the disk space
Thresholds are set in bytes only. For example, the recommended values for a 100 GB disk are as follows:
Alarm
:102,005,473,280
bytes (95%)Warning
:85,899,345,920
bytes (80%)
ClickHouse® is a registered trademark of ClickHouse, Inc