Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Data Transfer
  • Available transfers
  • Getting started
    • All guides
    • Preparing for a transfer
      • Managing endpoints
      • Migrating endpoints to a different availability zone
        • Source
        • Target
    • Managing transfer process
    • Working with databases during transfer
    • Monitoring transfer status
  • Troubleshooting
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials

In this article:

  • Scenarios for transferring data to MongoDB
  • Configuring the data source
  • Preparing the target database
  • Configuring the MongoDB target endpoint
  • Managed Service for MongoDB cluster
  • Custom installation
  • Additional settings
  • Operations with the database during transfer
  • Troubleshooting data transfer issues
  • Collection key size exceeds 5 MB
  • Collection object size exceeds 16 MB
  • No table found
  • Error when transferring a sharded cluster
  • Error when transferring timeseries collections
  • Unable to recognize an external cluster IP address or FQDN
  1. Step-by-step guides
  2. Configuring endpoints
  3. MongoDB
  4. Target

Transferring data to a MongoDB target endpoint

Written by
Yandex Cloud
Updated at April 24, 2025
  • Scenarios for transferring data to MongoDB
  • Configuring the data source
  • Preparing the target database
  • Configuring the MongoDB target endpoint
    • Managed Service for MongoDB cluster
    • Custom installation
    • Additional settings
  • Operations with the database during transfer
  • Troubleshooting data transfer issues
    • Collection key size exceeds 5 MB
    • Collection object size exceeds 16 MB
    • No table found
    • Error when transferring a sharded cluster
    • Error when transferring timeseries collections
    • Unable to recognize an external cluster IP address or FQDN

Yandex Data Transfer enables you to migrate data to a MongoDB database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

  1. Explore possible data transfer scenarios.
  2. Configure one of the supported data sources.
  3. Prepare the MongoDB database for the transfer.
  4. Configure the target endpoint in Yandex Data Transfer.
  5. Create a transfer and start it.
  6. Perform the required operations with the database and see how the transfer is going.
  7. In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data to MongoDBScenarios for transferring data to MongoDB

  1. Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.

    • Migrating a MongoDB cluster.
    • Migrating a MongoDB cluster from 4.4 to 6.0.
  2. Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.

    • Delivering data from Apache Kafka® to MongoDB.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Configuring the data sourceConfiguring the data source

Configure one of the supported data sources:

  • MongoDB
  • Airbyte®
  • YDS
  • Apache Kafka®

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

Preparing the target databasePreparing the target database

Managed Service for MongoDB
MongoDB
  1. Create a database.

  2. Create a user with the readWrite role for the new database.

  3. To shard the migrated collections in the Yandex Managed Service for MongoDB target cluster:

    1. Use this guide to create and configure empty sharded collections in the target database.

      Data Transfer does not automatically shard the migrated collections. Sharding large collections may take a long time and slow down the transfer.

    2. If sharding uses any key other than _id (default), assign the mdbShardingManager role to the user.

    3. When creating a target endpoint, select DISABLED or TRUNCATE as your cleanup policy.

      Selecting the DROP policy will result in the service deleting all the data from the target database, including sharded collections, and replacing them with new unsharded ones when a transfer is activated.

    Learn more about sharding in the MongoDB documentation.

  1. If not planning to use Cloud Interconnect or VPN for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer.

    For details on linking your network up with external resources, see this concept.

  2. Make sure the MongoDB version on the target is not lower than that on the source.

  3. Make sure the MongoDB cluster is configured so that it returns correctly resolving IP addresses or FQDNs (fully qualified domain names) in response to requests.

  4. Configure the target cluster to allow connections from the internet:

    1. In the configuration file, change net.bindIp from 127.0.0.1 to 0.0.0.0:

      # network interfaces
      net:
        port: 27017
        bindIp: 0.0.0.0
      
    2. Restart mongod:

      sudo systemctl restart mongod.service
      
  5. If the target cluster does not use replication, enable it:

    1. Add the replication settings to the /etc/mongod.conf configuration file:

      replication:
        replSetName: <replica_set_name>
      
    2. Restart mongod:

      sudo systemctl restart mongod.service
      
    3. Connect to MongoDB and initialize the replica set with this command:

      rs.initiate({
          _id: "<replica_set_name>",
          members: [{
              _id: 0,
              host: "<IP_address_listened_by_MongoDB>:<port>"
          }]
      });
      
  6. Connect to the cluster and create a target database:

    use <database_name>
    
  7. Create a user with the readWrite role for the target database:

    use admin;
    db.createUser({
        user: "<username>",
        pwd: "<password>",
        mechanisms: ["SCRAM-SHA-1"],
        roles: [
            {
                db: "<target_database_name>",
                role: "readWrite"
            }
        ]
    });
    

    Once started, the transfer will connect to the target on behalf of this user.

  8. To shard the migrated collections in the target cluster:

    1. Set up a database and populate it with empty collections with the same names as those in the source.

      Data Transfer does not automatically shard the migrated collections. Sharding large collections may take a long time and slow down the transfer.

    2. Enable target database sharding:

      sh.enableSharding("<target_database_name>")
      
    3. Shard every collection based on its namespace:

      sh.shardCollection("<target_database_name>.<collection_name>", { <field_name>: <1|"hashed">, ... });
      

      For the shardCollection() function description, see the MongoDB documentation.

    4. To make sure that sharding is set up and enabled, get a list of available shards:

      sh.status()
      
    5. If sharding uses any key other than _id (default), assign the clusterManager system role to the user Data Transfer will use for connection to the target cluster:

      use admin;
      db.grantRolesToUser("<username>", ["clusterManager"]);
      
    6. When creating a target endpoint, select DISABLED or TRUNCATE as your cleanup policy.

      Selecting the DROP policy will result in the service deleting all the data from the target database, including sharded collections, and replacing them with new unsharded ones when a transfer is activated.

    Learn more about sharding in the MongoDB documentation.

Configuring the MongoDB target endpointConfiguring the MongoDB target endpoint

Data Transfer supports transfers from MongoDB starting with version 3.6.

When creating or updating an endpoint, you can define:

  • Yandex Managed Service for MongoDB cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
  • Additional parameters.

Managed Service for MongoDB clusterManaged Service for MongoDB cluster

Warning

To create or edit an endpoint of a managed database, you will need the managed-mongodb.viewer role or the primitive viewer role for the folder the cluster of this managed database resides in.

Connecting to the database with the cluster ID specified in Yandex Cloud.

Management console
CLI
Terraform
API
  • Managed Service for MongoDB cluster: Specify ID of the cluster to connect to.

  • Authentication source: Specify the database name in the cluster.

  • User: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

  • Endpoint type: mongo-target.
  • --cluster-id: ID of the cluster you need to connect to.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • To set a user password to access the database, use one of the parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: mongo_target.
  • connection.connection_options.mdb_cluster_id: ID of cluster to connect to.

  • subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • security_groups: Security groups for network traffic.

    Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the cluster. For more information, see Networking in Yandex Data Transfer.

    Security groups and the subnet_id subnet, if the latter is specified, must belong to the same network as the cluster.

    Note

    In Terraform, it is not required to specify a network for security groups.

  • auth_source: Name of the cluster database.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Here is the configuration file example:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    mongo_target {
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          mdb_cluster_id = "<cluster_ID>"
          auth_source    = "<DB_name>"
          user           = "<username>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • mdbClusterId: ID of the cluster you need to connect to.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Custom installationCustom installation

Connecting to the database with explicitly specified network addresses and ports.

Management console
CLI
Terraform
API
  • Hosts: Specify the IPs or FQDNs of the hosts to connect to.

  • Replica set: Specify the name of the replica set.

  • Port: Set the number of the port that Data Transfer will use for the connection.

  • CA certificate: To encrypt the transferred data, upload the PEM certificate or add its contents as text.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the cluster.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • Authentication source: Specify the database name in the cluster.

  • User: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

  • Endpoint type: mongo-target.
  • --host: IP address or FQDN of the master host you want to connect to.

  • --port: Number of the port that Data Transfer will use for the connection.

  • --ca-certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • --subnet-id: ID of the subnet the host is in. The transfer will use that subnet to access the host.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • To set a user password to access the database, use one of the parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: mongo_target.
  • on_premise.port: Port number that Data Transfer will use for connections.

  • connection.connection_options.on_premise.tls_mode.enabled.ca_certificate: CA certificate if the data to transfer must be encrypted, e.g., to comply with the PCI DSS requirements.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • security_groups: Security groups for network traffic.

    Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.

    Security groups must belong to the same network as the subnet_id subnet, if the latter is specified.

    Note

    In Terraform, it is not required to specify a network for security groups.

  • subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • connection.connection_options.on_premise.replica_set: Specify the name of the replica set.

  • connection.connection_options.on_premise.hosts: Specify the IP addresses or FQDN of the hosts to connect to.

  • auth_source: Name of the cluster database.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Here is the configuration file example:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    mongo_target {
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          on_premise {
            hosts       = [ "list of replica set hosts" ]
            port        = "<port_for_connection>"
            replica_set = "<replica_set_name>"
            tls_mode {
              enabled {
                ca_certificate = "<certificate_in_PEM_format>"
              }
            }
          }
          auth_source = "<DB_name>"
          user        = "<username>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • onPremise: Database connection parameters:
    • hosts — IP address or FQDN of the master host to connect to.

    • port: The number of the port that Data Transfer will use for the connection.

    • tlsMode: Parameters for encrypting the data to transfer, if required, e.g., for compliance with the PCI DSS requirements.

      • disabled: Disabled.
      • enabled: Enabled.
        • caCertificate: CA certificate.

          Warning

          If no certificate is added, the transfer may fail with an error.

    • subnetId: ID of the subnet the host is in. The transfer will use that subnet to access the host.

  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Additional settingsAdditional settings

Management console
CLI
Terraform
API
  • Database: Enter the database name. If you do not specify any name, the source database name will be used.

  • Cleanup policy: Select a way to clean up data in the target database before the transfer:

    • Don't cleanup: Select this option if you are only going to do replication without copying data.

    • Drop: Completely delete tables included in the transfer (used by default).

      Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.

    • Truncate: Delete only the data from the tables included in the transfer but keep the schema.

      Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.

--target-database: Specify the database name if you want to create collections in a database that is different from the source database.

  • database: Specify the database name if you want to create collections in a database that is different from the source database.

  • cleanup_policy: Select a way to clean up data in the target database before the transfer:

    • DISABLED: Do not clean up (default).

      Select this option if only replication without copying data is performed.

    • DROP: Completely delete the collections included in the transfer.

      Use this option so that the latest collection version is always transferred to the target database from the source whenever the transfer is activated.

    • TRUNCATE: Delete only the data from the collections included in the transfer but keep the collections.

      Use this option if the structure of collections in the target database differs from the one that would have been transferred from the source during the transfer.

  • database: Specify the database name if you want to create collections in a database that is different from the source database.

  • cleanupPolicy: Select a way to clean up data in the target database before the transfer:

    • DISABLED: Do not clean up (default).

      Select this option if only replication without copying data is performed.

    • DROP: Completely delete the collections included in the transfer.

      Use this option so that the latest collection version is always transferred to the target database from the source whenever the transfer is activated.

    • TRUNCATE: Delete only the data from the collections included in the transfer but keep the collections.

      Use this option if the structure of collections in the target database differs from the one that would have been transferred from the source during the transfer.

Warning

By default, Data Transfer transfers collections without sharding. If you are transferring data to a sharded target cluster and want your collections to be sharded:

  1. Prepare the target cluster to shard the collections.
  2. Select DISABLED or TRUNCATE for cleanup policy.

Selecting the DROP policy will cause the service to delete all the data, including sharded collections, from the target database and replace them with new unsharded ones when activating a transfer.

After configuring the data source and target, create and start the transfer.

Operations with the database during transferOperations with the database during transfer

  • For transfers with the Copying status, you cannot perform any actions reducing the origin's operation log (oplog) time window. You should not add, delete, or reconfigure shards in any way during copying or perform any other actions resulting in a shorter operation log time window.

  • In transfers in Replicating status, you may encounter the key duplication problem when a sharded MongoDB cluster with a sharding index other than _id is the target. While a transfer is underway, we caution against creating clusters with sharding indexes other than _id on the target.

Troubleshooting data transfer issuesTroubleshooting data transfer issues

Known issues when using a MongoDB endpoint:

  • Collection key size exceeds 5 MB.
  • Collection object size exceeds 16 MB.
  • No tables found.
  • Error when transferring a sharded cluster.
  • Error when transferring timeseries collections.
  • Unable to recognize an external cluster IP address or FQDN.

For more troubleshooting tips, see Troubleshooting.

Collection key size exceeds 5 MBCollection key size exceeds 5 MB

Error message:

Warn(replication): Usage of bulk objects in 'database <DB_name>'
breaks change event log, transfer is stopping.
Reason: (Location<item_number>) Tried to create string longer than 16MB.

If the collection key size exceeds 5 MB, transfers of the Replication type crash due to MongoDB internal limits on the size of user objects.

Solution: exclude any collections that exceed MongoDB limits from the transfer and reactivate it.

Collection object size exceeds 16 MBCollection object size exceeds 16 MB

Error message:

Warn(replication): Usage of bulk objects in 'collection '<DB_name>.<collection_name>''
breaks change event log, transfer is stopping.
Reason: (BSONObjectTooLarge) BSONObj size: <object_size> (<object_size_in_hex>) is invalid.
Size muse be between 0 and 16793600(16MB).

If the collection object size exceeds 16 MB, transfers of Replication type crash due to MongoDB internal limits on user object size.

Solution: exclude any collections that exceed MongoDB limits from the transfer and reactivate it.

No table foundNo table found

Error message:

Unable to find any tables

An empty number of collections was extracted from the database. The user might be missing permissions for the database used in the transfer.

Solution: for the database to be transferred, grant the user the transfer uses to connect to the source readWrite permissions.

Error when transferring a sharded clusterError when transferring a sharded cluster

Solution: In the Snapshot settings → Parallel snapshot settings transfer parameter, specify the number of workers equal to the number of collections being transferred.

Error when transferring timeseries collectionsError when transferring timeseries collections

Error messages:

Unable to find any tables
Cannot execute mongo activate hook:
Failed in accordance with configuration:
some tables from include list are missing in the source database: [<collection_name>]

The service does not support transfers of Time Series collections.

Solution: exclude any Time Series collections from the transfer and reactivate it.

Unable to recognize an external cluster IP address or FQDNUnable to recognize an external cluster IP address or FQDN

The transfer fails with the error message:

server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: <unresolved_FQDN>, Type: Unknown, Last error: connection() error occurred during connection handshake: dial tcp: lookup <unresolved_FQDN> on <IP address>: no such host }, ] }"

The transfer error is due to the MongoDB cluster configuration. For example, when unresolved internal names are used in shard descriptions.

Solution:

Make sure the MongoDB cluster is configured so that it returns correctly resolving IP addresses or FQDNs (fully qualified domain names) in response to requests.

Was the article helpful?

Previous
Source
Next
Source
Yandex project
© 2025 Yandex.Cloud LLC