Sharding MongoDB collections
When sharding a Managed Service for MongoDB cluster, the following service hosts are automatically created and billed separately from the main DBMS hosts:
- either
MONGOS
andMONGOCFG
- or
MONGOINFRA
Alert
You can't unshard a cluster: to return a cluster to the state before it was sharded, you have to recreate it from a backup copy.
It makes sense to shard collections when splitting data into shards significantly helps improve DBMS performance or data availability. To increase availability, each shard should consist of 3 or more database hosts.
Ease of use and actual performance improvements significantly depend on the shard key you choose: make sure that the collection data is logically distributed across shards and is not linked to data in different shards.
You should use sharding for:
- Data of significant size: if the collection takes up more than 200 GB.
- Collections with non-uniform contents. For example, data can be clearly classified as frequently queried and rarely queried.
- Collections requiring high read and write speeds. Sharding helps distribute workloads among hosts to bypass technical limitations.
For more information about sharding, see Sharding in Managed Service for MongoDB.
How to enable collection sharding
Warning
Run all your sharding setup commands via the mongosh
CLI as a user with the mdbShardingManager role in the admin
-
Enable sharding for the cluster.
-
Connect to the
MONGOS
orMONGOINFRA
host via themongosh
CLI and enable sharding:sh.enableSharding("<DB_name>")
You can request the host type with a list of hosts in the cluster.
-
Define an index for the sharded collection:
db.getSiblingDB("<DB_name>").<collection_name>.createIndex( { "<index>": <index_type> } )
-
Enable collection sharding:
sh.shardCollection( "<DB_name>.<collection>", { "<index>": <index_type> } )
For a detailed description of the
shardCollection
command, see the MongoDB documentation . -
Modify the applications accessing your database to use only the
MONGOS
orMONGOINFRA
hosts.
Sharding heterogeneous data
If a collection includes documents with heterogeneous data types_id
key values of the same type using Type Bracketing_id
values of different types.
Useful links
You can learn how to solve issues related to sharding in the MongoDB documentation:
- Sharding overview: Sharding
. - About choosing a shard key and sharding strategies: Shard Keys
.
Example of sharding
Let's say you already have a sharded Managed Service for MongoDB cluster with a billing
database. You need to enable sharding for the payment
and addresses
collections. In the example, the payment
index hash and the addresses
field value are used as the shard key.
Sequence of operations:
-
Connect to the
billing
database. Make sure that the user connecting to the database has the mdbShardingManager role in the admin database. -
Enable sharding for the
billing
database:sh.enableSharding("billing")
-
Define the index for the sharded collection:
db.payments.ensureIndex( { "_id": "hashed" } )
-
Create the required number of shards in the management console
. -
Shard the collection based on its namespace:
sh.shardCollection( "billing.payments", { "_id": "hashed" } )
Sharding is now enabled and configured. To check this, try listing the available shards using the sh.status()
command.