Sharding in Managed Service for MongoDB
Sharding is a horizontal data scaling strategy that puts parts of MongoDB collections on different cluster hosts. A shard (set of hosts) is linked to a dataset with a shard key. MongoDB supports sharding to handle large volumes of data and increase DBMS throughput. Sharding is particularly useful when vertical scaling (upgrading server capacity) is either not cost-efficient or impossible.
Managed Service for MongoDB supports core data sharding strategies:
- Hashed sharding
(with a hash-based sharding key) - Ranged sharding
(by a shard key value range)
You can read more about MongoDB database sharding in the MongoDB documentation
Advantages of sharding
Sharding enables you to distribute workload between database hosts. It is usually used in the following cases:
- When you expect very frequent database queries and rapid data growth.
- When your application requires more and more resources but increasing the computing power of the cluster hosts (disks, RAM, and CPUs) is no longer an option.
Horizontal scaling involves distributing datasets and workload between multiple nodes. You can also add servers to increase disk capacity. A single machine may underperform in terms of capacity or speed; however, in a horizontally-scaled cluster, each machine processes only a portion of total workload and stores only a portion of total data. This makes the system potentially more efficient than a single high-capacity server with fast disks.
With sharding, you can:
-
Overcome technical limitations.
When you need to handle large datasets, your data storage infrastructure might reach the maximum capacity of commercially available hardware, e.g., disk subsystem IOPS.
If your apps approach the performance limits, it might be handy to split data into shards and distribute read operations.
-
Create geographically distributed systems.
By distributing your cluster shards across regions, you can:
- Improve availability for regional users.
- Comply with the local laws, for example, by storing your data in a particular country or region.
-
Improve fault tolerance.
Sharding allows you to isolate individual host or replica failures. If you do not use sharding, then, when one host fails, you lose access to the entire dataset it contains. Conversely, if one shard out of five fails, 80% of the collection data will still be available.
To reduce the risk of an entire shard going down, we recommend configuring shards as a group of three replicas. Furthermore, by distributing shard hosts across different Yandex Cloud availability zones, you can increase data availability.
-
Improve query performance.
Query performance can degrade due to resource contention. This usually happens as the number of read operations or CPU time per query increases.
Shards handle queries to the same collection in parallel, thus avoiding resource (CPU and disk subsystem) contention and reducing query processing time.
Use of sharding
To split data into shards:
- Enable sharding at the Managed Service for MongoDB cluster level.
- Add the required number of shards.
- Enable sharding for the applicable collections.
See also Example of sharding.
Sharding management in Managed Service for MongoDB
Managed Service for MongoDB manages shards as follows:
-
Due to limited resources, clusters with b1.medium and b2.medium hosts are not sharded.
-
You can create a sharded cluster or enable sharding later.
-
In Managed Service for MongoDB, sharding is managed by the hosts with the
MONGOS
(routing user queries ) andMONGOCFG
(storing shard configuration ) roles. -
In Managed Service for MongoDB, you can enable two types of sharding:
-
Standard: Cost-effective sharding for clusters that do not have any special requirements for sharding management hosts.
The cluster will be expanded to include the
MONGOINFRA
hosts having both theMONGOS
andMONGOCFG
roles. The minimum number of such hosts is three. -
Advanced: Flexible sharding for clusters that require a certain number of hosts for each role.
Dedicated
MONGOS
andMONGOCFG
servers will be added to the cluster. The cluster must have at least twoMONGOS
and threeMONGOCFG
hosts.
-
-
In a sharded cluster:
- All queries to Managed Service for MongoDB must be redirected to
MONGOS
orMONGOINFRA
hosts instead ofMONGOD
. - You cannot disable sharding or completely remove the hosts that support sharding: the cluster will always support a minimum number of
MONGOS
andMONGOCFG
orMONGOINFRA
hosts.
- All queries to Managed Service for MongoDB must be redirected to