Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Transfer
  • Available transfers
  • Getting started
    • All guides
    • Preparing for a transfer
      • Managing endpoints
      • Migrating endpoints to a different availability zone
        • Source
        • Target
    • Managing transfer process
    • Working with databases during transfer
    • Monitoring transfer status
  • Troubleshooting
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials

In this article:

  • Scenarios for transferring data from MongoDB
  • Preparing the source database
  • Configuring the MongoDB source endpoint
  • Managed Service for MongoDB cluster
  • Custom installation
  • Collection filter
  • Configuring the data target
  • Operations with the database during transfer
  • Troubleshooting data transfer issues
  • Collection key size exceeds 5 MB
  • Collection object size exceeds 16 MB
  • No table found
  • Error when transferring a sharded cluster
  • Error when transferring timeseries collections
  • Unable to recognize an external cluster IP address or FQDN
  • Error at data copying stage
  • Source data cannot be sharded
  1. Step-by-step guides
  2. Configuring endpoints
  3. MongoDB
  4. Source

Transferring data from a MongoDB source endpoint

Written by
Yandex Cloud
Updated at April 24, 2025
  • Scenarios for transferring data from MongoDB
  • Preparing the source database
  • Configuring the MongoDB source endpoint
    • Managed Service for MongoDB cluster
    • Custom installation
    • Collection filter
  • Configuring the data target
  • Operations with the database during transfer
  • Troubleshooting data transfer issues
    • Collection key size exceeds 5 MB
    • Collection object size exceeds 16 MB
    • No table found
    • Error when transferring a sharded cluster
    • Error when transferring timeseries collections
    • Unable to recognize an external cluster IP address or FQDN
    • Error at data copying stage
    • Source data cannot be sharded

Yandex Data Transfer enables you to migrate data from a MongoDB database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

  1. Explore possible data transfer scenarios.
  2. Prepare the MongoDB database for the transfer.
  3. Set up a source endpoint in Yandex Data Transfer.
  4. Set up one of the supported data targets.
  5. Create a transfer and start it.
  6. Perform the required operations with the database and see how the transfer is going.
  7. In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data from MongoDBScenarios for transferring data from MongoDB

  1. Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.

    • Migrating a MongoDB cluster.
    • Migrating a MongoDB cluster from 4.4 to 6.0.
  2. Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Preparing the source databasePreparing the source database

Managed Service for MongoDB
MongoDB
  1. Estimate the total number of databases for transfer and the total Managed Service for MongoDB workload. If the workload on the database exceeds 10,000 writes per second, create multiple endpoints and transfers. For more information, see Transferring data from a MongoDB source endpoint.

  2. Create a user with the readWrite role for each source database to replicate. The readWrite role is required to enable the transfer to write data to the __data_transfer.__dt_cluster_time service collection.

  1. Estimate the total number of databases for transfer and the total MongoDB workload. If the workload on the database exceeds 10,000 writes per second, create multiple endpoints and transfers. For more information, see Transferring data from a MongoDB source endpoint.

  2. If not planning to use Cloud Interconnect or VPN for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer.

    For details on linking your network up with external resources, see this concept.

  3. Make sure the MongoDB version on the target is 4.0 or higher.

  4. Make sure the MongoDB cluster is configured so that it returns correctly resolving IP addresses or FQDNs (fully qualified domain names) in response to requests.

  5. Configure access to the source cluster from Yandex Cloud. To configure a source cluster for connections from the internet:

    1. In the configuration file, change net.bindIp from 127.0.0.1 to 0.0.0.0:

      # network interfaces
      net:
        port: 27017
        bindIp: 0.0.0.0
      
    2. Restart mongod:

      sudo systemctl restart mongod.service
      
  6. If the source cluster does not use replication, enable it:

    1. Add the replication settings to the /etc/mongod.conf configuration file:

      replication:
        replSetName: <replica_set_name>
      
    2. Restart mongod:

      sudo systemctl restart mongod.service
      
    3. Connect to MongoDB and initialize the replica set with this command:

      rs.initiate({
          _id: "<replica_set_name>",
          members: [{
              _id: 0,
              host: "<IP_address_listened_by_MongoDB>:<port>"
          }]
      });
      
  7. Create a user with the readWrite role for all the source databases to replicate:

    use admin
    db.createUser({
        user: "<username>",
        pwd: "<password>",
        mechanisms: ["SCRAM-SHA-1"],
        roles: [
            {
                db: "<source_database_1_name>",
                role: "readWrite"
            },
            {
                db: "<source_database_2_name>",
                role: "readWrite"
            },
            ...
        ]
    });
    

    Once started, the transfer will connect to the source on behalf of this user. The readWrite role is required to enable the transfer to write data to the __data_transfer.__dt_cluster_time service collection.

    Note

    For MongoDB 3.6 or higher, you only need to assign the created user the read role for the databases to replicate.

  8. When using MongoDB 3.6 or higher, to run the transfer, the user must have the read permission for the local.oplog.rs collection and the read and write permissions for the __data_transfer.__dt_cluster_time collection. To assign a user the clusterAdmin role granting these permissions, connect to MongoDB and run the following commands:

    use admin;
    db.grantRolesToUser("<username>", ["clusterAdmin"]);
    

    To grant more granular permissions, you can assign the clusterMonitor role required for reading the local.oplog.rs collection and grant read and write access to the __data_transfer.__dt_cluster_time system collection.

Configuring the MongoDB source endpointConfiguring the MongoDB source endpoint

Data Transfer supports transfers from MongoDB starting with version 3.6.

When creating or updating an endpoint, you can define:

  • Yandex Managed Service for MongoDB cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
  • Additional parameters.

Managed Service for MongoDB clusterManaged Service for MongoDB cluster

Warning

To create or edit an endpoint of a managed database, you will need the managed-mongodb.viewer role or the primitive viewer role for the folder the cluster of this managed database resides in.

Connecting to the database with the cluster ID specified in Yandex Cloud.

Management console
CLI
Terraform
API
  • Managed Service for MongoDB cluster: Specify ID of the cluster to connect to.

  • Authentication source: Specify the database name in the cluster.

  • User: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

  • Endpoint type: mongo-source.
  • --cluster-id: ID of the cluster you need to connect to.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • To set a user password to access the database, use one of the parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: mongo_source.
  • connection.connection_options.mdb_cluster_id: ID of cluster to connect to.

  • subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • security_groups: Security groups for network traffic.

    Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the cluster. For more information, see Networking in Yandex Data Transfer.

    Security groups and the subnet_id subnet, if the latter is specified, must belong to the same network as the cluster.

    Note

    In Terraform, it is not required to specify a network for security groups.

  • auth_source: Name of the cluster database.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Here is the configuration file example:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    mongo_source {
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          mdb_cluster_id = "<cluster_ID>"
          auth_source    = "<DB_name>"
          user           = "<username>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • mdbClusterId: ID of the cluster you need to connect to.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Custom installationCustom installation

The settings are given for the OnPremise use case when all fields are filled in manually.

Management console
CLI
Terraform
API
  • Hosts: Specify the IPs or FQDNs of the hosts to connect to.

  • Replica set: Specify the name of the replica set.

  • Port: Set the number of the port that Data Transfer will use for the connection.

  • CA certificate: To encrypt the transferred data, upload the PEM certificate or add its contents as text.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the cluster.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • Authentication source: Specify the database name in the cluster.

  • User: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

  • Endpoint type: mongo-source.
  • --host: IP address or FQDN of the master host you want to connect to.

  • --port: Number of the port that Data Transfer will use for the connection.

  • --ca-certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • --subnet-id: ID of the subnet the host is in. The transfer will use that subnet to access the host.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • To set a user password to access the database, use one of the parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: mongo_source.
  • on_premise.port: Port number that Data Transfer will use for connections.

  • connection.connection_options.on_premise.tls_mode.enabled.ca_certificate: CA certificate if the data to transfer must be encrypted, e.g., to comply with the PCI DSS requirements.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • security_groups: Security groups for network traffic.

    Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.

    Security groups must belong to the same network as the subnet_id subnet, if the latter is specified.

    Note

    In Terraform, it is not required to specify a network for security groups.

  • subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • connection.connection_options.on_premise.replica_set: Specify the name of the replica set.

  • connection.connection_options.on_premise.hosts: Specify the IP addresses or FQDN of the hosts to connect to.

  • auth_source: Name of the cluster database.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Here is the configuration file example:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    mongo_source {
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          on_premise {
            hosts       = [ "list of replica set hosts" ]
            port        = "<port_for_connection>"
            replica_set = "<replica_set_name>"
            tls_mode {
              enabled {
                ca_certificate = "<certificate_in_PEM_format>"
              }
            }
          }
          auth_source = "<DB_name>"
          user        = "<username>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • onPremise: Database connection parameters:
    • hosts — IP address or FQDN of the master host to connect to.

    • port: The number of the port that Data Transfer will use for the connection.

    • tlsMode: Parameters for encrypting the data to transfer, if required, e.g., for compliance with the PCI DSS requirements.

      • disabled: Disabled.
      • enabled: Enabled.
        • caCertificate: CA certificate.

          Warning

          If no certificate is added, the transfer may fail with an error.

    • subnetId: ID of the subnet the host is in. The transfer will use that subnet to access the host.

  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Collection filterCollection filter

Management console
CLI
Terraform
API
  • Included collections: Data is only transferred from listed collections. All collections are transferred by default.

    When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.

  • Excluded collections: Data is transferred from all collections except the specified ones.

Included and excluded collection names must meet the ID naming rules in MongoDB. Escaping double quotes is not required.

  • --include-collection: Transfer data only from the listed collections. The values are specified in <database_name>.<collection_name> format. All collections are transferred by default.

    When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.

  • --exclude-collection: Transfer data from all collections except the specified ones. The values are specified in <database_name>.<collection_name> format.

  • --prefer-secondary: Set to true to use replicas (if there are any in the cluster) instead of the master host to read data.

  • collections: Transfer data only from the listed collections. All collections are transferred by default.

    When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.

  • excluded_collections: Data is transferred from all collections except the specified ones.

  • secondary_preferred_mode: Set to true to use replicas (if there are any in the cluster) instead of the master host to read data.

  • collections: Transfer data only from the listed collections. You need to specify the following for each collection:

    • databaseName: Database name
    • collectionName: Collection name

    All collections are transferred by default.

    When you add new collections while editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these collections will not get uploaded. To add a collection with its historical data, use the List of objects for transfer field in the transfer settings.

  • excludedCollections: Transfer data from all collections except the specified ones. You need to specify the following for each collection:

    • databaseName: Database name
    • collectionName: Collection name
  • secondaryPreferredMode: Set to true to use replicas (if there are any in the cluster) instead of the master host to read data.

If a source workload is high (over 10,000 write transactions per second), we recommend that you select these settings to have no more than ten different databases at each endpoint. This will help avoid database connection errors while the transfer is ongoing.

Note

  • If you use several endpoints, you need to create a separate transfer for each one.
  • As transfers of timeseries collections are not supported, you should exclude such collections.

Configuring the data targetConfiguring the data target

Configure one of the supported data targets:

  • Yandex Object Storage
  • MongoDB

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

After configuring the data source and target, create and start the transfer.

Operations with the database during transferOperations with the database during transfer

  • For transfers with the Copying status, you cannot perform any actions reducing the origin's operation log (oplog) time window. You should not add, delete, or reconfigure shards in any way during copying or perform any other actions resulting in a shorter operation log time window.

  • In transfers in Replicating status, you may encounter the key duplication problem when a sharded MongoDB cluster with a sharding index other than _id is the target. While a transfer is underway, we caution against creating clusters with sharding indexes other than _id on the target.

Troubleshooting data transfer issuesTroubleshooting data transfer issues

Known issues when using a MongoDB endpoint:

  • Collection key size exceeds 5 MB.
  • Collection object size exceeds 16 MB.
  • No tables found.
  • Error when transferring a sharded cluster.
  • Error when transferring timeseries collections.
  • Unable to recognize an external cluster IP address or FQDN.
  • Error at the copying stage

For more troubleshooting tips, see Troubleshooting.

Collection key size exceeds 5 MBCollection key size exceeds 5 MB

Error message:

Warn(replication): Usage of bulk objects in 'database <DB_name>'
breaks change event log, transfer is stopping.
Reason: (Location<item_number>) Tried to create string longer than 16MB.

If the collection key size exceeds 5 MB, transfers of the Replication type crash due to MongoDB internal limits on the size of user objects.

Solution: exclude any collections that exceed MongoDB limits from the transfer and reactivate it.

Collection object size exceeds 16 MBCollection object size exceeds 16 MB

Error message:

Warn(replication): Usage of bulk objects in 'collection '<DB_name>.<collection_name>''
breaks change event log, transfer is stopping.
Reason: (BSONObjectTooLarge) BSONObj size: <object_size> (<object_size_in_hex>) is invalid.
Size muse be between 0 and 16793600(16MB).

If the collection object size exceeds 16 MB, transfers of Replication type crash due to MongoDB internal limits on user object size.

Solution: exclude any collections that exceed MongoDB limits from the transfer and reactivate it.

No table foundNo table found

Error message:

Unable to find any tables

An empty number of collections was extracted from the database. The user might be missing permissions for the database used in the transfer.

Solution: for the database to be transferred, grant the user the transfer uses to connect to the source readWrite permissions.

Error when transferring a sharded clusterError when transferring a sharded cluster

Solution: In the Snapshot settings → Parallel snapshot settings transfer parameter, specify the number of workers equal to the number of collections being transferred.

Error when transferring timeseries collectionsError when transferring timeseries collections

Error messages:

Unable to find any tables
Cannot execute mongo activate hook:
Failed in accordance with configuration:
some tables from include list are missing in the source database: [<collection_name>]

The service does not support transfers of Time Series collections.

Solution: exclude any Time Series collections from the transfer and reactivate it.

Unable to recognize an external cluster IP address or FQDNUnable to recognize an external cluster IP address or FQDN

The transfer fails with the error message:

server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: <unresolved_FQDN>, Type: Unknown, Last error: connection() error occurred during connection handshake: dial tcp: lookup <unresolved_FQDN> on <IP address>: no such host }, ] }"

The transfer error is due to the MongoDB cluster configuration. For example, when unresolved internal names are used in shard descriptions.

Solution:

Make sure the MongoDB cluster is configured so that it returns correctly resolving IP addresses or FQDNs (fully qualified domain names) in response to requests.

Error at data copying stageError at data copying stage

The Snapshot and increment type transfer terminates with the following error at the copying stage:

encountered non-recoverable resume token error. Sync cannot be resumed from this state and must be terminated and re-enabled to continue functioning: (ChangeStreamHistoryLost) Resume of change stream was not possible, as the resume point may no longer be in the oplog.

The ChangeStreamHistoryLost error occurs when the total copy time of the MongoDB origin cluster data exceeds the operation log (oplog) time window size. You can check the current time window size in the management console. See the Oplog window graph of the cluster monitoring page.

For more information on oplog, see the MongoDB documentation.

Solution:

  • Increase the oplog size (10% of the cluster disk size by default). To increase the oplog size in a Managed Service for MongoDB origin cluster, contact technical support. To change the oplog size if using a custom origin installation, see the MongoDB documentation.
  • Enable parallel data copying to speed up the copying stage.
  • Limit the list of transferable objects in the transfer settings.

Once that is done, activate the transfer again.

Source data cannot be shardedSource data cannot be sharded

The transfer from a MongoDB source fails with the following error message:

ERROR: Unable to Activate
error: "failed to execute mongo activate hook: Snapshot loading failed: unable to shard upload tables: unable to shard upload (main worker) tables: unable to shard tables for operation ID: unable to split table, err: cannot get delimiters: there are two or more types of objects in the sharding index"

The cannot get delimiters: there are two or more types of objects in the sharding index error means that the source collection id field contains different data types, making the source unsuitable for sharding.

Solution:

In the Snapshot settings → Parallel snapshot settings transfer settings, specify one worker and one stream to disable sharding.

Once that is done, activate the transfer again.

Was the article helpful?

Previous
Source
Next
Target
© 2025 Direct Cursus Technology L.L.C.