Disk types in Managed Service for ClickHouse®
Managed Service for ClickHouse® allows you to use network and local storage drives for database clusters. Network storage drives are based on network blocks, which are virtual disks in the Yandex Cloud infrastructure. Local disks are physically located in the database host servers.
When creating a cluster, you can select the following disk types for data storage:
-
Network HDD storage (
network-hdd
): Most cost-effective option for clusters that do not require high read/write performance. -
Network SSD storage (
network-ssd
): Balanced solution. Such disks are slower than local SSD storage, but, unlike local disks, they ensure data integrity in case Yandex Cloud hardware goes down. -
Non-replicated SSD storage (
network-ssd-nonreplicated
): Network SSD storage with enhanced performance but without redundancy.The storage size can only be increased in 93 GB increments.
-
Local SSDs (
local-ssd
): Disks with the fastest performance.The size of such a storage can be increased:
- For Intel Broadwell and Intel Cascade Lake: Only in 100 GB increments.
- For Intel Ice Lake: In 368 GB increments only.
Note
For clusters with hosts residing in the
ru-central1-d
availability zone, local SSD storage is not available if using the Intel Cascade Lake platform.
Hybrid storage
If you enable the Hybrid storage setting when creating or updating a cluster, you will be able to distribute data between cluster storage and Yandex Object Storage object storage. Thus your data will reside in either cluster or object storage, depending on the storage policy you specify. For example, you can choose to store your frequently used (hot) data in cluster storage and the rarely used (cold) data in the less expensive and slower object storage.
Warning
Hybrid storage is only available for MergeTree
Object storage uses a service bucket with unlimited storage capacity. It has the Standard storage class, which cannot be changed. Object Storage service limits apply to object storage.
To start using hybrid storage:
-
Create a cluster of the appropriate type. You do not need to configure object storage.
-
Add databases and tables to the cluster. If the default storage policy is not suitable for some tables, set the appropriate policies for these tables:
-
To set the the policy when creating a table, configure the
storage_policy
setting:CREATE TABLE table_with_non_default_policy ( <table_schema> ) ENGINE = MergeTree ... SETTINGS storage_policy = '<storage_policy_type>';
-
To create or update the policy for an existing table, run the following query:
ALTER TABLE table_with_non_default_policy MODIFY SETTING storage_policy = '<storage_policy_type>';
-
See an example in the Using hybrid storage tutorial.
To track the amount of space used by MergeTreech_s3_disk_parts_size
metric in Yandex Monitoring. It is only available for Managed Service for ClickHouse® clusters with hybrid storage set up.
Available storage policies
Note
You cannot create new storage policies or update the existing ones.
A Managed Service for ClickHouse® cluster with enabled hybrid storage supports the following storage policies:
-
default
: The cluster automatically manages data placement depending on:- Hybrid storage settings.
- Table TTL
(time-to-live) settings.
If there is enough free space in the cluster storage, only the rows with the expired TTL are moved to object storage. This operation allows you to move part of the data to object storage before the cluster storage becomes full.
You can configure moving the expired rows to object storage and set the TTL value when creating a table or later.
-
local
: In tables with this policy, rows are placed only in cluster storage. There is no data transfer between storages. -
object storage
: In tables with this policy, rows are placed only in object storage. There is no data transfer between storages.
Storage policies do not affect merge operations
- Enable and disable the
prefer_not_to_merge
setting that merges stored data parts. This setting is available in the CLI and API. - Set any
max_data_part_size_bytes
value for the maximum size of the data part you will get on merging smaller ones.
However, you can configure the behavior of these operations using the settings available in the cluster.
You can view up-to-date policy settings with the following query:
SELECT *
FROM system.storage_policies;
For more information about storage policies and their settings, see the ClickHouse® documentation
Hybrid storage settings
A Managed Service for ClickHouse® cluster with enabled hybrid storage has the following settings:
-
data_cache_enabled
: Allows you to cache data requested from object storage in cluster storage. This setting is enabled by default (set totrue
).In this case,
cold
data requested from object storage is written to fast drives where data processing takes less time. -
data_cache_max_size
: Sets the maximum cache size (in bytes) allocated in cluster storage for data requested from object storage. The default value is1073741824
(1 GB). -
move_factor
: Sets the minimum share of free space in cluster storage. If the actual value is less than this setting value, the data is moved to Yandex Object Storage. The minimum value is0
, the maximum one is1
, and the default one is0.01
.Data parts are queued up in descending order according to size, and then as many of them are moved as will satisfy the
move_factor
condition. -
prefer_not_to_merge
: Disables merging of data parts in cluster and object storage. The merge functionality is enabled by default.Once inserted into the table, the data is saved as a data part and sorted based on the primary key. Next the data parts belonging to the same partition are merged in the background into a larger data part within 10 to 15 minutes after the insertion. You can use the system.parts
system table to view the merged data parts and partitions.
You can specify hybrid storage settings when creating or updating a cluster.
For more information about setting up hybrid storage, see the ClickHouse® documentation
Selecting disk type during cluster creation
The number of hosts you can create together with a ClickHouse® cluster depends on the selected disk type:
-
With local SSD (
local-ssd
) storage, you can create a cluster with two or more hosts.This cluster will be fault-tolerant.
Local SSD storage has an effect on how much a cluster will cost: you pay for it even if it is stopped. You can find more information in the pricing policy.
-
With non-replicated network SSD (
network-ssd-nonreplicated
) storage, you can create a cluster with three or more hosts.This cluster will be fault-tolerant.
-
With network HDD (
network-hdd
) or network SSD (network-ssd
) storage, you can add any number of hosts within the current quota.
For more information about limits on the number of hosts per cluster, see Quotas and limits.