Sharding in Managed Service for ClickHouse®

Written by

Updated at January 14, 2026

Advantages of sharding
Use of sharding
Sharding management in Managed Service for ClickHouse®
Use cases

Sharding is a horizontal cluster scaling strategy that puts parts of a single ClickHouse® database on different shards. A shard consists of one or more replica hosts. A write or read request for the shard can be sent to any of its replicas, as there is no dedicated master. When data is inserted, it is taken from the replica on which the INSERT request was run and copied to other replicas in the shard in asynchronous mode.

Advantages of sharding

Sharding is often used in the following cases:

When you expect very frequent database queries and rapid data growth.
When your application requires more and more resources, but a replicated cluster can no longer be scaled using the vertical strategy, i.e., by increasing the computing power of the host resources, such as disks, RAM, and CPU.

Sharding can help you:

Overcome technical limitations.

If you need to work with large datasets, your data storage infrastructure may cause your commercially available hardware to become stressed to the limit (for example, the disk subsystem will demonstrate poor IOPS metrics). If the application is running at hardware performance limits, it may be a good idea to distribute data across shards. In this case, the read operations will be performed concurrently.
Increase availability.

Sharding allows you to isolate individual host or replica failures. If you do not use sharding, then, when one host or a few replicas fail, you lose access to the entire dataset they contain. Conversely, if one shard out of five fails, 80% of the table data will still be available.
Improve the query performance.

Queries compete with each other for the computing resources of cluster hosts, which can reduce the rate of query processing. This drop in the rate usually becomes obvious as the number of read operations or CPU time per query grows. However, in a sharded cluster, where queries to the same table can be run in parallel, there is no competition for shared resources, which allows you to reduce the query processing time.

Use of sharding

To distribute data across shards, create a distributed table on the Distributed engine that uses these shards. A part of data in such a table will be stored on one shard, and another part, on other shards. Technically, pieces of data are stored in the underlying tables that are located on the hosts of each shard. The distributed table only routes queries to these tables.

When running the INSERT query, ClickHouse® uses a sharding key to determine where the data should be placed. Its value determines which shard the query will target. The sharding key is similar to the partitioning key. Ease of use and actual performance improvements when using sharding significantly depend on the sharding key you choose: the right sharding key ensures that the data is logically distributed across shards and is not linked to data in different shards. Unlike the INSERT queries, the SELECT queries send subqueries to all shards in the cluster, regardless of how data is distributed across the shards.

ClickHouse® offers two different approaches to operating distributed tables with flexible data distribution in a cluster:

You can create a distributed table that uses all shards in a cluster (see the example here).
You can create a distributed table that uses a group of shards in a cluster (click here for a regular sharding example, and here, for an example of advanced sharding). Such a group includes only some shards in a cluster.

In this case, you can:
- Place tables with data on the same cluster shards that are used by the distributed table.
- Place tables with data in one shard group, and the distributed table that routes queries to these tables with data, in another shard group.

For example, you can set up the following shard configuration within a single ClickHouse® cluster using these methods:

Group A comprised of two shards with s2.small hosts: Used as the main one for a distributed table with a low load. Data in the distributed table data is stored in the same shard group.
Group B comprised of two shards with s2.medium hosts: Used as the main one for a distributed table with a constant high load. The data in the distributed table is stored in another group, C, comprised of five shards with high-performance m2.large hosts.

To learn more about operating distributed tables, see this ClickHouse® guide.

Sharding management in Managed Service for ClickHouse®

Managed Service for ClickHouse® manages shards as follows:

When you create a cluster, a shard named shard1 is automatically added to it. This shard includes all hosts in the cluster. When you create a cluster with multiple hosts, replication support is automatically enabled.
You can add the appropriate number of shards to the created cluster.

To fully benefit from all advantages of sharding, your cluster should have at least two shards. Using a distributed table and a single shard is equivalent to using replication without sharding. It only protects against data loss when one or more shard replica hosts fail; however, it does not provide distributed data storage.
You can add hosts to a shard.

Shards with multiple hosts require running replication. Therefore:
- Clusters with a multi-host shard have ClickHouse® Keeper or ZooKeeper replication already running, which means you can immediately add hosts to the shard.
- In clusters with single-host shards, you should first turn on the ClickHouse® Keeper or ZooKeeper coordination service and only then proceed to add hosts to the shard.
For more information about replication, ClickHouse® Keeper, and ZooKeeper, see Replication.

Shard IDs in Yandex Cloud are different from their IDs in ClickHouse®:

Managed Service for ClickHouse® clusters use shard names.
ClickHouse® uses numeric IDs that match the alphabetic order of shard names in Managed Service for ClickHouse®, e.g., A-shard, B-shard, shard10, shard100.

Keep this in mind if your application accesses shards by their IDs when writing and reading distributed table data.

To learn how IDs in Yandex Cloud and ClickHouse® match, run this query:

SELECT
 substitution AS shard_name,
 shardNum() AS shard_number
FROM cluster('{cluster}', system.macros)
WHERE macro = 'shard'
ORDER BY shard_name

For example, in a cluster with the shard1, shard2, shard3, shard4, and shard100 shards, the matching will look as follows:

┌─shard_name─┬─shard_number─┐
│ shard1     │            1 │
└────────────┴──────────────┘
┌─shard_name─┬─shard_number─┐
│ shard100   │            2 │
└────────────┴──────────────┘
┌─shard_name─┬─shard_number─┐
│ shard2     │            3 │
└────────────┴──────────────┘
┌─shard_name─┬─shard_number─┐
│ shard3     │            4 │
└────────────┴──────────────┘
┌─shard_name─┬─shard_number─┐
│ shard4     │            5 │
└────────────┴──────────────┘

Use cases

Table sharding in ClickHouse®

ClickHouse® is a registered trademark of ClickHouse, Inc.

Sharding in Managed Service for ClickHouse®

Advantages of shardingAdvantages of sharding

Use of shardingUse of sharding

Sharding management in Managed Service for ClickHouse®Sharding management in Managed Service for ClickHouse®

Use casesUse cases

Was the article helpful?

Advantages of sharding

Use of sharding

Sharding management in Managed Service for ClickHouse®

Use cases