Migrating data to Yandex StoreDoc
To migrate your data to Yandex StoreDoc, follow these steps: transfer the data, switch the legacy database to read-only mode, transfer the workload over to the target cluster in Yandex Cloud.
You can migrate data from a third-party source cluster to a Yandex StoreDoc target cluster using the following two methods:
-
Migrating data via Yandex Data Transfer.
This migration method allows you to:
- Migrate your database with zero downtime.
- Migrate from older to newer versions of MongoDB.
- Eliminate the need for an intermediate VM or for exposing your Yandex StoreDoc target cluster to the internet.
To use this migration method, enable internet access to the source cluster.
For more information, see Problems addressed by Yandex Data Transfer.
-
A dump is a collection of files allowing you to restore a database to a specific state. To migrate data to a Yandex StoreDoc cluster, create a database dump using
mongodumpand restore it on the target cluster usingmongorestore. To ensure dump integrity, switch the source cluster toread-only
mode before creating the dump.
Required paid resources
When migrating data using Data Transfer, you pay for the following resources:
- Yandex StoreDoc cluster: computing resources allocated to hosts, storage and backup size (see Yandex StoreDoc pricing).
- Public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
- Each transfer: use of computing resources and the number of transferred data rows (see Data Transfer pricing).
When migrating data using a database dump, you pay for the following resources:
- Yandex StoreDoc cluster: computing resources allocated to hosts, storage and backup size (see Yandex StoreDoc pricing).
- Public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
- VM instance: use of computing resources, storage, public IP address, and OS (see Compute Cloud pricing).
Getting started
Create a Yandex StoreDoc target cluster with computing capacity and storage size matching the source database’s environment.
The source and target database names must be the same.
Migrating data using Yandex Data Transfer
-
Create a source endpoint with the following parameters:
-
Database type:
MongoDB. -
Endpoint parameters → Connection settings:
Custom installation.Configure the source cluster connection settings.
Note
Transferring of
Time Seriescollections is not supported, so you should exclude such collections in the endpoint settings. -
-
Create a target endpoint with the following parameters:
-
Database type:
MongoDB. -
Endpoint parameters → Connection settings:
Yandex StoreDoc cluster.Specify the ID of the target cluster.
-
-
Create a Snapshot and increment-type transfer and configure it to use the previously created endpoints.
To make large collections (over 1 GB) copy more quickly, enable parallel copy in the transfer settings. Specify two or more workers. The collection will be split into the specified number of parts that will be copied concurrently.
For parallel copy to work, the
_idfield data type must be the same for all documents in the same collection. If a transfer discovers a type mismatch, the collection will not be partitioned but transferred in a single thread instead. If needed, remove documents with mismatched data types from the collection before starting a transfer.Note
If a document with a different data type is added to a collection after a transfer starts, the transfer will move it at the replication stage after the parallel copy operation is completed. However, when re-activated, the transfer will not be able to partition a collection because the
_idfield type requirement will not be met for some of the documents in the collection. -
Wait for the transfer status to change to Replicating.
-
Switch the source cluster to "read-only" mode and transfer the load to the target cluster.
-
On the transfer monitoring page, wait for the Maximum data transfer delay metric to decrease to zero. This means that all changes that occurred in the source cluster after data copying was completed are transferred to the target cluster.
-
Deactivate the transfer and wait for its status to change to Stopped.
For more information about transfer statuses, see Transfer lifecycle.
Migration via database dump
Procedure:
- Create a dump of the source database using
mongodump. - If necessary, create a VM in Compute Cloud to restore the database from the dump within the Yandex Cloud infrastructure.
- Restore the data from the dump to the cluster using
mongorestore.
Create a dump
Use mongodump to create a database dump.
-
Install
mongodumpand other MongoDB tools. Example for Ubuntu 20.04 LTS:wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add - echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list sudo apt update sudo apt install mongodb-org-shell mongodb-org-tools -
Before creating a dump, we recommend switching your database to read-only mode to avoid losing any data that might be written during the dump process.
-
Create a database dump:
mongodump --host <DBMS_server_address> \ --port <port> \ --username <username> \ --password "<password>" \ --db <DB_name> \ --out ~/db_dumpIf you can use multiple CPU cores for the dump, specify the
-jflag with the number of available cores:mongodump --host <DBMS_server_address> \ --port <port> \ --username <username> \ --password "<password>" \ -j <number_of_cores> \ --db <DB_name> \ --out ~/db_dump -
Archive the dump:
tar -cvzf db_dump.tar.gz ~/db_dump
Optionally, create a VM for dump upload
You will need an intermediate VM in Yandex Compute Cloud under the following conditions:
- Your Yandex StoreDoc cluster is not reachable from the internet.
- Your hardware or connection to the cluster in Yandex Cloud is not very reliable.
To prepare your virtual machine for dump recovery:
-
In the management console, create a new VM from an Ubuntu 20.04 LTS image. The necessary amount of RAM and the number of CPU cores depend on the volume of data transferred and the required transfer speed.
The minimum configuration (1 core, 2 GB RAM, 10 GB disk space) should be sufficient for migrating a database of up to 1 GB. The larger the database being migrated, the more disk space and RAM are required, with the available disk space at least twice the database size.
The VM must reside in the same network and availability zone as the Yandex StoreDoc cluster’s master host. The VM must have an external IP address, which will allow you to upload the dump from outside Yandex Cloud.
-
Install the MongoDB client and additional database utilities:
wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add - echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list sudo apt update sudo apt install mongodb-org-shell mongodb-org-tools -
Move the database dump from your server to the VM via
scpor a similar tool:scp ~/db_dump.tar.gz <VM_user_name>@<VM_public_address>:/tmp/db_dump.tar.gz -
Extract the dump on the virtual machine:
tar -xzf /tmp/db_dump.tar.gz
Now you have a VM with a database dump, ready to be restored to the Yandex StoreDoc cluster.
Restore the data
Restore your database from the dump via mongorestore.
-
If you are restoring a dump from a VM located in Yandex Cloud:
mongorestore --host <DBMS_server_address> \ --port <port> \ --username <username> \ --password "<password>" \ -j <number_of_streams> \ --authenticationDatabase <DB_name> \ --nsInclude '*.*' /tmp/db_dump -
If you are restoring a dump from a server outside Yandex Cloud, you must explicitly specify the SSL settings for
mongorestore:mongorestore --host <DBMS_server_address> \ --port <port> \ --ssl \ --sslCAFile <path_to_certificate_file> \ --username <username> \ --password "<password>" \ -j <number_of_streams> \ --authenticationDatabase <DB_name> \ --nsInclude '*.*' ~/db_dump -
To transfer only specific collections, use the
--nsIncludeand--nsExcludeflags to specify the namespaces that should be included and excluded from the collections being restored.