Transferring data from a MongoDB source endpoint

Written by

Updated at April 24, 2025

Scenarios for transferring data from MongoDB
Preparing the source database
Configuring the MongoDB source endpoint
Configuring the data target
Operations with the database during transfer
Troubleshooting data transfer issues

Yandex Data Transfer enables you to migrate data from a MongoDB database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

Explore possible data transfer scenarios.
Prepare the MongoDB database for the transfer.
Set up a source endpoint in Yandex Data Transfer.
Set up one of the supported data targets.
Create a transfer and start it.
Perform the required operations with the database and see how the transfer is going.
In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data from MongoDB

Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.
- Migrating a MongoDB cluster.
- Migrating a MongoDB cluster from 4.4 to 6.0.
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Preparing the source database

Managed Service for MongoDB

MongoDB

Estimate the total number of databases for transfer and the total Managed Service for MongoDB workload. If the workload on the database exceeds 10,000 writes per second, create multiple endpoints and transfers. For more information, see Transferring data from a MongoDB source endpoint.
Create a user with the readWrite role for each source database to replicate. The readWrite role is required to enable the transfer to write data to the __data_transfer.__dt_cluster_time service collection.

Estimate the total number of databases for transfer and the total MongoDB workload. If the workload on the database exceeds 10,000 writes per second, create multiple endpoints and transfers. For more information, see Transferring data from a MongoDB source endpoint.
If not planning to use Cloud Interconnect or VPN for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer.

For details on linking your network up with external resources, see this concept.
Make sure the MongoDB version on the target is 4.0 or higher.
Make sure the MongoDB cluster is configured so that it returns correctly resolving IP addresses or FQDNs (fully qualified domain names) in response to requests.
Configure access to the source cluster from Yandex Cloud. To configure a source cluster for connections from the internet:
1. In the configuration file, change net.bindIp from 127.0.0.1 to 0.0.0.0:
```
# network interfaces
net:
  port: 27017
  bindIp: 0.0.0.0
```
2. Restart mongod:
```
sudo systemctl restart mongod.service
```

If the source cluster does not use replication, enable it:

Add the replication settings to the /etc/mongod.conf configuration file:
```
replication:
  replSetName: <replica_set_name>
```
Restart mongod:
```
sudo systemctl restart mongod.service
```

Connect to MongoDB and initialize the replica set with this command:

rs.initiate({
    _id: "<replica_set_name>",
    members: [{
        _id: 0,
        host: "<IP_address_listened_by_MongoDB>:<port>"
    }]
});

Create a user with the readWrite role for all the source databases to replicate:

use admin
db.createUser({
    user: "<username>",
    pwd: "<password>",
    mechanisms: ["SCRAM-SHA-1"],
    roles: [
        {
            db: "<source_database_1_name>",
            role: "readWrite"
        },
        {
            db: "<source_database_2_name>",
            role: "readWrite"
        },
        ...
    ]
});

Once started, the transfer will connect to the source on behalf of this user. The readWrite role is required to enable the transfer to write data to the __data_transfer.__dt_cluster_time service collection.

Note

For MongoDB 3.6 or higher, you only need to assign the created user the read role for the databases to replicate.

When using MongoDB 3.6 or higher, to run the transfer, the user must have the read permission for the local.oplog.rs collection and the read and write permissions for the __data_transfer.__dt_cluster_time collection. To assign a user the clusterAdmin role granting these permissions, connect to MongoDB and run the following commands:
```
use admin;
db.grantRolesToUser("<username>", ["clusterAdmin"]);
```
To grant more granular permissions, you can assign the clusterMonitor role required for reading the local.oplog.rs collection and grant read and write access to the __data_transfer.__dt_cluster_time system collection.

Configuring the MongoDB source endpoint

Data Transfer supports transfers from MongoDB starting with version 3.6.

When creating or updating an endpoint, you can define:

Yandex Managed Service for MongoDB cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
Additional parameters.

Managed Service for MongoDB cluster

Warning

To create or edit an endpoint of a managed database, you will need the managed-mongodb.viewer role or the primitive viewer role for the folder the cluster of this managed database resides in.

Connecting to the database with the cluster ID specified in Yandex Cloud.

Management console

CLI

Terraform

API

Managed Service for MongoDB cluster: Specify ID of the cluster to connect to.
Authentication source: Specify the database name in the cluster.
User: Specify the username that Data Transfer will use to connect to the database.
Password: Enter the user's password to the database.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Endpoint type: mongo-source.

--cluster-id: ID of the cluster you need to connect to.
--database: Database name.
--user: Username that Data Transfer will use to connect to the database.
--security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.
To set a user password to access the database, use one of the parameters:
- --raw-password: Password as text.
- --password-file: The path to the password file.

Endpoint type: mongo_source.

connection.connection_options.mdb_cluster_id: ID of cluster to connect to.
subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
security_groups: Security groups for network traffic.

Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the cluster. For more information, see Networking in Yandex Data Transfer.

Security groups and the subnet_id subnet, if the latter is specified, must belong to the same network as the cluster.

Note

In Terraform, it is not required to specify a network for security groups.
auth_source: Name of the cluster database.
connection.connection_options.user: Username that Data Transfer will use to connect to the database.
connection.connection_options.password.raw: Password in text form.

Here is the configuration file example:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    mongo_source {
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          mdb_cluster_id = "<cluster_ID>"
          auth_source    = "<DB_name>"
          user           = "<username>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.
mdbClusterId: ID of the cluster you need to connect to.
database: Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Database user password (in text form).

Custom installation

The settings are given for the OnPremise use case when all fields are filled in manually.

Management console

CLI

Terraform

API

Hosts: Specify the IPs or FQDNs of the hosts to connect to.
Replica set: Specify the name of the replica set.
Port: Set the number of the port that Data Transfer will use for the connection.
CA certificate: To encrypt the transferred data, upload the PEM certificate or add its contents as text.

Warning

If no certificate is added, the transfer may fail with an error.
Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the cluster.

If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
Authentication source: Specify the database name in the cluster.
User: Specify the username that Data Transfer will use to connect to the database.
Password: Enter the user's password to the database.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Endpoint type: mongo-source.

--host: IP address or FQDN of the master host you want to connect to.
--port: Number of the port that Data Transfer will use for the connection.
--ca-certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.

Warning

If no certificate is added, the transfer may fail with an error.
--subnet-id: ID of the subnet the host is in. The transfer will use that subnet to access the host.
--database: Database name.
--user: Username that Data Transfer will use to connect to the database.
--security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.
To set a user password to access the database, use one of the parameters:
- --raw-password: Password as text.
- --password-file: The path to the password file.

Endpoint type: mongo_source.

on_premise.port: Port number that Data Transfer will use for connections.
connection.connection_options.on_premise.tls_mode.enabled.ca_certificate: CA certificate if the data to transfer must be encrypted, e.g., to comply with the PCI DSS requirements.

Warning

If no certificate is added, the transfer may fail with an error.
security_groups: Security groups for network traffic.

Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.

Security groups must belong to the same network as the subnet_id subnet, if the latter is specified.

Note

In Terraform, it is not required to specify a network for security groups.
subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
connection.connection_options.on_premise.replica_set: Specify the name of the replica set.
connection.connection_options.on_premise.hosts: Specify the IP addresses or FQDN of the hosts to connect to.
auth_source: Name of the cluster database.
connection.connection_options.user: Username that Data Transfer will use to connect to the database.
connection.connection_options.password.raw: Password in text form.

Here is the configuration file example:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    mongo_source {
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          on_premise {
            hosts       = [ "list of replica set hosts" ]
            port        = "<port_for_connection>"
            replica_set = "<replica_set_name>"
            tls_mode {
              enabled {
                ca_certificate = "<certificate_in_PEM_format>"
              }
            }
          }
          auth_source = "<DB_name>"
          user        = "<username>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

onPremise: Database connection parameters:
- hosts — IP address or FQDN of the master host to connect to.
- port: The number of the port that Data Transfer will use for the connection.
- tlsMode: Parameters for encrypting the data to transfer, if required, e.g., for compliance with the PCI DSS requirements.
  - disabled: Disabled.
  - enabled: Enabled.
    - caCertificate: CA certificate.
      
      Warning
      
      If no certificate is added, the transfer may fail with an error.
- subnetId: ID of the subnet the host is in. The transfer will use that subnet to access the host.
securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.
database: Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Database user password (in text form).

Collection filter

Management console

CLI

Terraform

API

Included collections: Data is only transferred from listed collections. All collections are transferred by default.

When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.
Excluded collections: Data is transferred from all collections except the specified ones.

Included and excluded collection names must meet the ID naming rules in MongoDB. Escaping double quotes is not required.

--include-collection: Transfer data only from the listed collections. The values are specified in <database_name>.<collection_name> format. All collections are transferred by default.

When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.
--exclude-collection: Transfer data from all collections except the specified ones. The values are specified in <database_name>.<collection_name> format.
--prefer-secondary: Set to true to use replicas (if there are any in the cluster) instead of the master host to read data.

collections: Transfer data only from the listed collections. All collections are transferred by default.

When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.
excluded_collections: Data is transferred from all collections except the specified ones.
secondary_preferred_mode: Set to true to use replicas (if there are any in the cluster) instead of the master host to read data.

collections: Transfer data only from the listed collections. You need to specify the following for each collection:
- databaseName: Database name
- collectionName: Collection name
All collections are transferred by default.

When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.
excludedCollections: Transfer data from all collections except the specified ones. You need to specify the following for each collection:
- databaseName: Database name
- collectionName: Collection name
secondaryPreferredMode: Set to true to use replicas (if there are any in the cluster) instead of the master host to read data.

If a source workload is high (over 10,000 write transactions per second), we recommend that you select these settings to have no more than ten different databases at each endpoint. This will help avoid database connection errors while the transfer is ongoing.

Note

If you use several endpoints, you need to create a separate transfer for each one.
As transfers of timeseries collections are not supported, you should exclude such collections.

Configuring the data target

Configure one of the supported data targets:

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

After configuring the data source and target, create and start the transfer.

Operations with the database during transfer

For transfers with the Copying status, you cannot perform any actions reducing the origin's operation log (oplog) time window. You should not add, delete, or reconfigure shards in any way during copying or perform any other actions resulting in a shorter operation log time window.
In transfers in Replicating status, you may encounter the key duplication problem when a sharded MongoDB cluster with a sharding index other than _id is the target. While a transfer is underway, we caution against creating clusters with sharding indexes other than _id on the target.

Troubleshooting data transfer issues

Known issues when using a MongoDB endpoint:

For more troubleshooting tips, see Troubleshooting.

Collection key size exceeds 5 MB

Error message:

Warn(replication): Usage of bulk objects in 'database <DB_name>'
breaks change event log, transfer is stopping.
Reason: (Location<item_number>) Tried to create string longer than 16MB.

If the collection key size exceeds 5 MB, transfers of the Replication type crash due to MongoDB internal limits on the size of user objects.

Solution: exclude any collections that exceed MongoDB limits from the transfer and reactivate it.

Collection object size exceeds 16 MB

Error message:

Warn(replication): Usage of bulk objects in 'collection '<DB_name>.<collection_name>''
breaks change event log, transfer is stopping.
Reason: (BSONObjectTooLarge) BSONObj size: <object_size> (<object_size_in_hex>) is invalid.
Size muse be between 0 and 16793600(16MB).

If the collection object size exceeds 16 MB, transfers of Replication type crash due to MongoDB internal limits on user object size.

Solution: exclude any collections that exceed MongoDB limits from the transfer and reactivate it.

No table found

Error message:

Unable to find any tables

An empty number of collections was extracted from the database. The user might be missing permissions for the database used in the transfer.

Solution: for the database to be transferred, grant the user the transfer uses to connect to the source readWrite permissions.

Error when transferring a sharded cluster

Solution: In the Snapshot settings → Parallel snapshot settings transfer parameter, specify the number of workers equal to the number of collections being transferred.

Error when transferring timeseries collections

Error messages:

Unable to find any tables

Cannot execute mongo activate hook:
Failed in accordance with configuration:
some tables from include list are missing in the source database: [<collection_name>]

The service does not support transfers of Time Series collections.

Solution: exclude any Time Series collections from the transfer and reactivate it.

Unable to recognize an external cluster IP address or FQDN

The transfer fails with the error message:

server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: <unresolved_FQDN>, Type: Unknown, Last error: connection() error occurred during connection handshake: dial tcp: lookup <unresolved_FQDN> on <IP address>: no such host }, ] }"

The transfer error is due to the MongoDB cluster configuration. For example, when unresolved internal names are used in shard descriptions.

Solution:

Make sure the MongoDB cluster is configured so that it returns correctly resolving IP addresses or FQDNs (fully qualified domain names) in response to requests.

Error at data copying stage

The Snapshot and increment type transfer terminates with the following error at the copying stage:

encountered non-recoverable resume token error. Sync cannot be resumed from this state and must be terminated and re-enabled to continue functioning: (ChangeStreamHistoryLost) Resume of change stream was not possible, as the resume point may no longer be in the oplog.

The ChangeStreamHistoryLost error occurs when the total copy time of the MongoDB origin cluster data exceeds the operation log (oplog) time window size. You can check the current time window size in the management console. See the Oplog window graph of the cluster monitoring page.

For more information on oplog, see the MongoDB documentation.

Solution:

Increase the oplog size (10% of the cluster disk size by default). To increase the oplog size in a Managed Service for MongoDB origin cluster, contact technical support. To change the oplog size if using a custom origin installation, see the MongoDB documentation.
Enable parallel data copying to speed up the copying stage.
Limit the list of transferable objects in the transfer settings.

Once that is done, activate the transfer again.

Source data cannot be sharded

The transfer from a MongoDB source fails with the following error message:

ERROR: Unable to Activate
error: "failed to execute mongo activate hook: Snapshot loading failed: unable to shard upload tables: unable to shard upload (main worker) tables: unable to shard tables for operation ID: unable to split table, err: cannot get delimiters: there are two or more types of objects in the sharding index"

The cannot get delimiters: there are two or more types of objects in the sharding index error means that the source collection id field contains different data types, making the source unsuitable for sharding.

Solution:

In the Snapshot settings → Parallel snapshot settings transfer settings, specify one worker and one stream to disable sharding.

Once that is done, activate the transfer again.

Transferring data from a MongoDB source endpoint

Scenarios for transferring data from MongoDBScenarios for transferring data from MongoDB

Preparing the source databasePreparing the source database

Configuring the MongoDB source endpointConfiguring the MongoDB source endpoint

Managed Service for MongoDB clusterManaged Service for MongoDB cluster

Custom installationCustom installation

Collection filterCollection filter

Configuring the data targetConfiguring the data target

Operations with the database during transferOperations with the database during transfer

Troubleshooting data transfer issuesTroubleshooting data transfer issues

Collection key size exceeds 5 MBCollection key size exceeds 5 MB

Collection object size exceeds 16 MBCollection object size exceeds 16 MB

No table foundNo table found

Error when transferring a sharded clusterError when transferring a sharded cluster

Error when transferring timeseries collectionsError when transferring timeseries collections

Unable to recognize an external cluster IP address or FQDNUnable to recognize an external cluster IP address or FQDN

Error at data copying stageError at data copying stage

Source data cannot be shardedSource data cannot be sharded

Was the article helpful?

Scenarios for transferring data from MongoDB

Preparing the source database

Configuring the MongoDB source endpoint

Managed Service for MongoDB cluster

Custom installation

Collection filter

Configuring the data target

Operations with the database during transfer

Troubleshooting data transfer issues

Collection key size exceeds 5 MB

Collection object size exceeds 16 MB

No table found

Error when transferring a sharded cluster

Error when transferring timeseries collections

Unable to recognize an external cluster IP address or FQDN

Error at data copying stage

Source data cannot be sharded