Migrating data to Yandex StoreDoc
To migrate your data to Yandex StoreDoc, transfer the data, write-lock the old database, and transfer the load to the target cluster in Yandex Cloud.
There are two ways to migrate data from a third-party source cluster to a Yandex StoreDoc target cluster:
-
Transferring data using Yandex Data Transfer.
This migration method allows you to:
- Migrate the database without interrupting user service.
- Migrate from older MongoDB versions to newer versions.
- Go without creating an intermediate VM or granting online access to your Yandex StoreDoc target cluster.
To use this migration method, allow connecting to the source cluster from the internet.
For more information, see Problems addressed by Yandex Data Transfer.
-
Migrating a database using a dump.
A dump is a set of files using which you can restore the state of a database. To migrate data to a Yandex StoreDoc cluster, create a database dump using
mongodumpand restore it in the target cluster usingmongorestore. To achieve a full dump, switch the source cluster toread-only
before you create it.
Required paid resources
The cost of transferring data with Yandex Data Transfer includes:
- Yandex StoreDoc target cluster fee: use of computing resources allocated to hosts and disk space (see Yandex StoreDoc pricing).
- Fee for public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
- Per-transfer fee: use of computing resources and number of transferred data rows (see Data Transfer pricing).
The cost of transferring data using a database dump includes:
- Yandex StoreDoc target cluster fee: use of computing resources allocated to hosts and disk space (see Yandex StoreDoc pricing).
- Fee for public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
- When creating a VM to download a dump: fee for the use of computing resources, storage, OS (for specific operating systems), and, optionally, public IP address (see Compute Cloud pricing).
Getting started
Create a Yandex StoreDoc target cluster with the computing capacity and storage size appropriate for the environment where the migrated database is deployed.
The source and target database names must be the same.
Migrating data using Yandex Data Transfer
-
Create a source endpoint with the following parameters:
-
Database type:
MongoDB. -
Endpoint parameters → Connection settings:
Custom installation.Configure the source cluster connection settings.
Note
Transferring of
Time Seriescollections is not supported, so you should exclude such collections in the endpoint settings. -
-
Create a target endpoint with the following parameters:
-
Database type:
MongoDB. -
Endpoint parameters → Connection settings:
Yandex StoreDoc cluster.Specify the ID of the target cluster.
-
-
Create a Snapshot and increment-type transfer and configure it to use the previously created endpoints.
To make large collections (over 1 GB) copy more quickly, enable parallel copy in the transfer settings. Specify two or more workers. The collection will be split into the specified number of parts that will be copied concurrently.
For parallel copy to work, the
_idfield data type must be the same for all documents in the same collection. If a transfer discovers a type mismatch, the collection will not be partitioned but transferred in a single thread instead. If needed, remove documents with mismatched data types from the collection before starting a transfer.Note
If a document with a different data type is added to a collection after a transfer starts, the transfer will move it at the replication stage after the parallel copy operation is completed. However, when re-activated, the transfer will not be able to partition a collection because the
_idfield type requirement will not be met for some of the documents in the collection. -
Wait for the transfer status to change to Replicating.
-
Switch the source cluster to "read-only" mode and transfer the load to the target cluster.
-
On the transfer monitoring page, wait for the Maximum data transfer delay metric to decrease to zero. This means that all changes that occurred in the source cluster after data copying was completed are transferred to the target cluster.
-
Deactivate the transfer and wait for its status to change to Stopped.
For more information about transfer statuses, see Transfer lifecycle.
Migrating a database using a dump
Sequence of actions:
- Create a dump of the database you want to migrate using
mongodump. - If necessary, create a VM in Compute Cloud to restore the database from a dump in the Yandex Cloud infrastructure.
- Restore data from the dump in the cluster using
mongorestore.
Create a dump
You can create a database dump using mongodump.
-
Install
mongodumpand other utilities for working with MongoDB. Example for Ubuntu 20.04 LTS:wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add - echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list sudo apt update sudo apt install mongodb-org-shell mongodb-org-tools -
Before creating a dump, we recommend switching the DBMS to "read-only" to avoid losing data that might appear while the dump is created.
-
Create a database dump:
mongodump --host <DBMS_server_address> \ --port <port> \ --username <username> \ --password "<password>" \ --db <DB_name> \ --out ~/db_dumpIf you have access to multiple processor cores to create a dump, specify the
-jflag with the number of cores available to you:mongodump --host <DBMS_server_address> \ --port <port> \ --username <username> \ --password "<password>" \ -j <number_of_cores> \ --db <DB_name> \ --out ~/db_dump -
Archive the dump:
tar -cvzf db_dump.tar.gz ~/db_dump
Optionally, create a VM to download a dump
You will need an intermediate VM in Yandex Compute Cloud if:
- Your Yandex StoreDoc cluster is not accessible from the internet.
- Your hardware or connection to the cluster in Yandex Cloud is not very reliable.
To prepare the virtual machine to restore the dump:
-
In the management console, create a new VM from an Ubuntu 20.04 LTS image. The required amount of RAM and processor cores depends on the amount of data to migrate and the required migration speed.
The minimum configuration (1 core, 2 GB RAM, 10 GB disk space) should be sufficient to migrate a database up to 1 GB in size. The larger the database, the more disk storage (at least twice the size of the database) and RAM you need for migration.
The VM instance must be in the same network and availability zone as the Yandex StoreDoc cluster master host. The VM must be also assigned an external IP address so that you can upload the dump file from outside Yandex Cloud.
-
Install the MongoDB client and additional utilities for working with the DBMS:
wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add - echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list sudo apt update sudo apt install mongodb-org-shell mongodb-org-tools -
Move the DB dump from your server to the VM. For example, you can use
scp:scp ~/db_dump.tar.gz <VM_user_name>@<VM_public_address>:/tmp/db_dump.tar.gz -
Unpack the dump on the virtual machine:
tar -xzf /tmp/db_dump.tar.gz
As a result, you will get a VM with a database dump that is ready to be restored to the Yandex StoreDoc cluster.
Recover data
Use the mongorestore utility to restore your DB dump.
-
If you restore a dump from the VM in Yandex Cloud:
mongorestore --host <DBMS_server_address> \ --port <port> \ --username <username> \ --password "<password>" \ -j <number_of_streams> \ --authenticationDatabase <DB_name> \ --nsInclude '*.*' /tmp/db_dump -
If you restore from a dump stored on a server outside Yandex Cloud, SSL parameters must be explicitly set for
mongorestore:mongorestore --host <DBMS_server_address> \ --port <port> \ --ssl \ --sslCAFile <path_to_certificate_file> \ --username <username> \ --password "<password>" \ -j <number_of_streams> \ --authenticationDatabase <DB_name> \ --nsInclude '*.*' ~/db_dump -
If you want to transfer specific collections, set the
--nsIncludeand--nsExcludeflags, specifying the namespaces to include or exclude for the collections to restore.