Sharding Yandex StoreDoc collections
When sharding a Yandex StoreDoc cluster, the system automatically creates auxiliary hosts that are billed separately from your main database hosts:
- The hosts created are either
MONGOSandMONGOCFG - or
MONGOINFRA
Alert
You can't unshard a cluster: to return a cluster to the state before it was sharded, you have to recreate it from a backup copy.
Consider sharding collections when distributing data across shards can significantly improve database performance or data availability. To increase availability, we recommend composing each shard of 3 or more database hosts.
Both usability and performance improvements depend on your choice of the shard key. Make sure that data is logically distributed across shards and that data in different shards is not interrelated.
You should use sharding for:
- Large datasets: Collections exceeding 200 GB.
- Collections with mixed contents, e.g., frequently and infrequently accessed data.
- Collections requiring high read/write throughput: Sharding spreads the load across hosts to work around technical limitations.
To learn more about sharding, see Sharding in Yandex StoreDoc.
Required paid resources
You pay for using a Yandex StoreDoc cluster: Computing resources allocated to hosts, including secondary service hosts, as well as storage and backup size (see Yandex StoreDoc pricing).
How to enable collection sharding
Warning
To configure sharding via the mongosh CLI, you must run all operations as a user with the mdbShardingManager role in the admin database.
-
Enable sharding for the cluster.
-
Connect to your
MONGOSorMONGOINFRAhost via themongoshCLI and enable sharding:sh.enableSharding("<DB_name>")You can get the host type from the list of cluster hosts.
-
Create an index for the sharded collection:
db.getSiblingDB("<DB_name>").<collection_name>.createIndex( { "<index>": <index_type> } ) -
Enable collection sharding:
sh.shardCollection( "<DB_name>.<collection>", { "<index>": <index_type> } ) -
Reconfigure applications accessing your database to use only
MONGOSorMONGOINFRAhosts.
Heterogeneous sharding
If a collection includes documents with heterogeneous data types, we recommend creating shards based on the _id key values of the same type, using the Type Bracketing mechanism. This makes sharding and document search faster than when using _id values of different types.
Sharding example
Suppose you have a sharded Yandex StoreDoc cluster with a billing database. You need to enable sharding for the payment and addresses collections. In our example, the shard key consists of the payment index hash and the addresses value.
Procedure:
-
Connect to the
billingdatabase. Make sure that the account you use to connect to the database has the mdbShardingManager role in theadmindatabase. -
Enable sharding for the
billingdatabase:sh.enableSharding("billing") -
Create an index for the sharded collection:
db.payments.ensureIndex( { "_id": "hashed" } ) -
Create the required number of shards in the management console
. -
Shard the collection based on its namespace:
sh.shardCollection( "billing.payments", { "_id": "hashed" } )
Once this operation is complete, sharding will be enabled and configured. To confirm this, try listing all available shards using the sh.status() command.