Sharding in Yandex StoreDoc
Sharding is a horizontal data scaling strategy that spreads parts of Yandex StoreDoc collections across different cluster hosts. A shard (set of hosts) is linked to a dataset with a shard key. Yandex StoreDoc supports sharding to handle large volumes of data and increase DBMS throughput. Sharding is particularly useful when vertical scaling (upgrading server capacity) is either not cost-efficient or impossible.
Yandex StoreDoc supports core data sharding strategies:
- Hashed sharding (with a hash-based sharding key)
- Ranged sharding (by a value range)
Advantages of sharding
Sharding enables you to distribute workload between database hosts. It is usually used in the following cases:
- When you expect very frequent database queries and rapid data growth.
- When your application requires more and more resources but increasing the computing power of the cluster hosts (disks, RAM, and CPUs) is no longer an option.
Horizontal scaling involves distributing datasets and workload between multiple nodes. You can also add servers to increase disk capacity. A single machine may underperform in terms of capacity or speed; however, in a horizontally-scaled cluster, each machine processes only a portion of total workload and stores only a portion of total data. This makes the system potentially more efficient than a single high-capacity server with fast disks.
With sharding, you can:
-
Overcome technical limitations.
If you work with large datasets, your data storage infrastructure may hit the limits of commercially available hardware, e.g., disk subsystem IOPS.
If your apps approach the performance limits, it might be handy to split data into shards and distribute read operations.
-
Create geographically distributed systems.
By distributing your cluster shards across regions, you can:
- Improve availability for regional users.
- Comply with the local laws, for example, by storing your data in a particular country or region.
-
Improve query performance.
Query performance can degrade due to resource contention. This usually happens as the number of read operations or CPU time per query increases.
Shards handle queries to the same collection in parallel, thus avoiding resource (CPU and disk subsystem) contention and reducing query processing time.
Use of sharding
To split data into shards:
- Enable sharding at the Yandex StoreDoc cluster level.
- Add the required number of shards.
- Enable sharding for the applicable collections.
See also Example of sharding.
Sharding specifics in Yandex StoreDoc
Yandex StoreDoc manages shards as follows:
-
Due to limited resources, clusters with b1.medium and b2.medium hosts are not sharded.
-
You can create a sharded cluster or you can enable sharding later.
-
In Yandex StoreDoc, sharding is managed by the hosts with the
MONGOS(routing user queries) andMONGOCFG(storing shard configuration) roles. For more information, see Host types. -
In Yandex StoreDoc, you can enable two types of sharding:
-
Standard: Cost-effective sharding for clusters that do not have any special requirements for sharding management hosts.
The cluster will be expanded to include the
MONGOINFRAhosts having both theMONGOSandMONGOCFGroles. The minimum number of such hosts is three. -
Advanced: Flexible sharding for clusters that require a certain number of hosts for each role.
Dedicated
MONGOSandMONGOCFGservers will be added to the cluster. The cluster must have at least twoMONGOSand threeMONGOCFGhosts.
-
-
In a sharded cluster:
- All queries to Yandex StoreDoc must be redirected to
MONGOSorMONGOINFRAhosts instead ofMONGOD. - You cannot disable sharding or completely remove the hosts that support sharding: the cluster will always support a minimum number of
MONGOSandMONGOCFGorMONGOINFRAhosts.
- All queries to Yandex StoreDoc must be redirected to
For more information, see Host types.