Transferring data to a OpenSearch target endpoint

Written by

Updated at October 21, 2025

Scenarios for transferring data to OpenSearch
Configuring the data source
Preparing the target database
Configuring the OpenSearch target endpoint
Troubleshooting data transfer issues

Yandex Data Transfer enables you to migrate data to a OpenSearch database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

Explore possible data transfer scenarios.
Configure one of the supported data sources.
Prepare the OpenSearch database for the transfer.
Configure the target endpoint in Yandex Data Transfer.
Create a transfer and start it.
Perform required operations with the database and control the transfer.
In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data to OpenSearch

Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
- Delivering data from Apache Kafka® to OpenSearch.
Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.
- Migrating an OpenSearch cluster.
- Migration with change of storage from PostgreSQL to OpenSearch.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Configuring the data source

Configure one of the supported data sources:

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

Note

There is a data type restriction: if the sorurce sends an ip record (IP address), it will be saved as a text record in the target.

Preparing the target database

Managed Service for OpenSearch

OpenSearch

Make sure the number of columns in the source does not exceed the maximum number of fields in OpenSearch indexes. The maximum number of fields is provided in the index.mapping.total_fields.limit parameter. Its default value is 1,000.

Warning

Exceeding the limit will result in the Limit of total fields [1000] has been exceeded error and the transfer will be stopped.

To increase the parameter value, set up a template that makes the maximum number of fields in new indexes equal to the specified value.
Sample template setup request
```
curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT "https://<URL_of_OpenSearch_host_with_DATA_role>:9200/_template/index_defaults" \
--data '
    {
        "index_patterns": "cdc*",
        "settings": {
            "index": {
                "mapping": {
                    "total_fields": {
                        "limit": "2000"
                    }
                }
            }
        }
    }'
```
With this template setup, all new indexes with the cdc* mask will be able to contain up to 2,000 fields.
You can also set up templates using the OpenSearch Dashboards interface.

To check the current index.mapping.total_fields.limit parameter value, execute the following request:
```
curl \
    --user <OpenSearch_username>:<password> \
    --header 'Content-Type: application/json' \
    --request GET 'https://<URL_of_OpenSearch_host_with_DATA_role>:9200/<index name>/_settings/*total_fields.limit?include_defaults=true'
```

By default, when transferring data to a single index, only one host is used. To distribute the load across hosts when transferring large amounts of data, set up a template to split new indexes into shards in advance.

Sample template setup request

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<URL_of_OpenSearch_host_with_DATA_role>:9200/_template/index_defaults' \
--data '
    {
        "index_patterns": "cdc*",
        "settings" : {
            "index" : {
                "number_of_shards" : 15,
                "number_of_replicas" : 1
            }
        }
    }'

With this template setup, all new indexes with the cdc* mask will be split into 15 shards.

You can also set up templates using the OpenSearch Dashboards interface.

To enhance data security and availability, set up a policy that will create a new index if at least one of the following conditions is met (recommended values):

Index is over 50 GB in size.
Index is over 30 days old.

You can create and enable a policy using requests. For more information about policies, see the OpenSearch documentation.

Example of a policy creation request

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_plugins/_ism/policies/rollover_policy' \
--data '
    {
        "policy": {
            "description": "Example rollover policy",
            "default_state": "rollover",
            "schema_version": 1,
            "states": [
                {
                    "name": "rollover",
                    "actions": [
                        {
                            "rollover": {
                                "min_index_age": "30d",
                                "min_primary_shard_size": "50gb"
                            }
                        }
                    ],
                    "transitions": []
                }
            ],
            "ism_template": {
                "index_patterns": ["log*"],
                "priority": 100
            }
        }
    }'

Example of a request to assign an alias to a policy

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_index_template/ism_rollover' \
--data '
    {
        "index_patterns": ["log*"],
        "template": {
            "settings": {
                "plugins.index_state_management.rollover_alias": "log"
            }
        }
    }'

Example of a request to create an index with a policy alias

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/log-000001' \
--data '
    {
        "aliases": {
            "log": {
                "is_write_index": true
            }
        }
    }'

Example of a request to check if a policy is attached to an index

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request GET 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_plugins/_ism/explain/log-000001?pretty'

If not planning to use Cloud Interconnect or VPN for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer.

For details on linking your network up with external resources, see this concept.

Make sure the number of columns in the source does not exceed the maximum number of fields in OpenSearch indexes. The maximum number of fields is provided in the index.mapping.total_fields.limit parameter. Its default value is 1,000.

Warning

Exceeding the limit will result in the Limit of total fields [1000] has been exceeded error and the transfer will be stopped.

To increase the parameter value, set up a template that makes the maximum number of fields in new indexes equal to the specified value.
Sample template setup request
```
curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT "https://<URL_of_OpenSearch_host_with_DATA_role>:9200/_template/index_defaults" \
--data '
    {
        "index_patterns": "cdc*",
        "settings": {
            "index": {
                "mapping": {
                    "total_fields": {
                        "limit": "2000"
                    }
                }
            }
        }
    }'
```
With this template setup, all new indexes with the cdc* mask will be able to contain up to 2,000 fields.
You can also set up templates using the OpenSearch Dashboards interface.

To check the current index.mapping.total_fields.limit parameter value, execute the following request:
```
curl \
    --user <OpenSearch_username>:<password> \
    --header 'Content-Type: application/json' \
    --request GET 'https://<URL_of_OpenSearch_host_with_DATA_role>:9200/<index name>/_settings/*total_fields.limit?include_defaults=true'
```

Sample template setup request

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<URL_of_OpenSearch_host_with_DATA_role>:9200/_template/index_defaults' \
--data '
    {
        "index_patterns": "cdc*",
        "settings" : {
            "index" : {
                "number_of_shards" : 15,
                "number_of_replicas" : 1
            }
        }
    }'

With this template setup, all new indexes with the cdc* mask will be split into 15 shards.

You can also set up templates using the OpenSearch Dashboards interface.

To enhance data security and availability, set up a policy that will create a new index if at least one of the following conditions is met (recommended values):

Index is over 50 GB in size.
Index is over 30 days old.

You can create and enable a policy using requests. For more information about policies, see the OpenSearch documentation.

Example of a policy creation request

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_plugins/_ism/policies/rollover_policy' \
--data '
    {
        "policy": {
            "description": "Example rollover policy",
            "default_state": "rollover",
            "schema_version": 1,
            "states": [
                {
                    "name": "rollover",
                    "actions": [
                        {
                            "rollover": {
                                "min_index_age": "30d",
                                "min_primary_shard_size": "50gb"
                            }
                        }
                    ],
                    "transitions": []
                }
            ],
            "ism_template": {
                "index_patterns": ["log*"],
                "priority": 100
            }
        }
    }'

Example of a request to assign an alias to a policy

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_index_template/ism_rollover' \
--data '
    {
        "index_patterns": ["log*"],
        "template": {
            "settings": {
                "plugins.index_state_management.rollover_alias": "log"
            }
        }
    }'

Example of a request to create an index with a policy alias

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/log-000001' \
--data '
    {
        "aliases": {
            "log": {
                "is_write_index": true
            }
        }
    }'

Example of a request to check if a policy is attached to an index

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request GET 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_plugins/_ism/explain/log-000001?pretty'

Configuring the OpenSearch target endpoint

When creating or updating an endpoint, you can define:

Yandex Managed Service for OpenSearch cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
Additional parameters.

Managed Service for OpenSearch cluster

Warning

To create or edit an endpoint of a managed database, you will need the managed-opensearch.viewer role or the primitive viewer role for the folder the cluster of this managed database resides in.

Connection with the cluster specified in Yandex Cloud.

Management console

Connection type: Select a cluster connection option:
- Self-managed: Allows you to specify connection settings manually.
  
  Select Managed Service for OpenSearch cluster as the installation type and configure these settings:
  - Managed Service for OpenSearch cluster: Select the cluster to connect to.
  - User: Specify the username Data Transfer will use to connect to the cluster.
  - Password: Enter the user password to the cluster.
- Connection Manager: Allows connecting to the cluster via Yandex Connection Manager:
  - Select the folder with the Managed Service for OpenSearch cluster.
  - Select Managed DB cluster as the installation type and configure these settings:
    - Cluster for Managed DB: Select the cluster to connect to.
    - Connection: Select or create a connection in Connection Manager.
  Warning
  
  To use a connection from Connection Manager, the user must have access permissions for this connection of connection-manager.user or higher.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Custom installation

Connection to nodes with explicitly specified network addresses and ports.

Management console

Connection type: Select a database connection option:
- Self-managed: Allows you to specify connection settings manually.
  
  Select Custom installation as the installation type and configure these settings:
  - Data nodes: Click to add a new data node. For each node, specify:
    - Host: IP address or FQDN of the host with the DATA role you need to connect to.
    - Port: Port number Data Transfer will use to connect to the host with the DATA role.
  - SSL: Select this option if a secure SSL connection is used.
  - CA certificate: Upload the certificate file or add its contents as text if you need to encrypt the data to transfer, e.g., for compliance with the PCI DSS requirements.
    
    Warning
    
    If no certificate is added, the transfer may fail with an error.
  - Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the database.
    
    If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
    
    If you do not specify a subnet, you may get an error when activating the transfer.
  - User: Specify the username Data Transfer will use to connect to the database.
  - Password: Enter the user password for access to the database.
- Connection Manager: Allows connecting to the database using Yandex Connection Manager:
  - Select the folder where the Connection Manager connection was created.
  - Select Custom installation as the installation type and configure these settings:
    - Connection: Select or create a connection in Connection Manager.
    - Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the database.
      
      If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
      
      If you do not specify a subnet, you may get an error when activating the transfer.
  Warning
  
  To use a connection from Connection Manager, the user must have access permissions for this connection of connection-manager.user or higher.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and DBs in the selected network without changing the settings of these VMs and DBs. For more information, see Networking in Yandex Data Transfer.

Additional settings

Management console

Cleanup policy: Select a way to clean up data in the target database before the transfer:
- Don't cleanup: Select this option if you are only going to do replication without copying data.
- Drop: Completely delete the tables included in the transfer (default).
  
  Use this option to always transfer the latest version of the table schema to the target database from the source whenever the transfer is activated.
Sanitize documents keys: Use this option to automatically replace keys that are not valid for OpenSearch in the target fields.

The autocorrect rules are as follows:
- Empty keys or keys consisting of spaces and periods will be replaced with underscores: "", " ", "." → "_".
- Leading and trailing periods will be removed: "somekey.", ".somekey" → "somekey".
- If there are two periods in a row or there is nothing but spaces between them, the entire fragment will be replaced with a period: " some . . key" → " some . key".
Here is an example of how the autocorrect works: ". s o m e ..incorrect....key. . . " → " s o m e .incorrect.key".

After configuring the data source and target, create and start the transfer.

Troubleshooting data transfer issues

For more troubleshooting tips, see Troubleshooting.

Transfer failure

Error messages:

object field starting or ending with a [.] makes object resolution ambiguous <field_description>

Index -1 out of bounds for length 0

The transfer is aborted because the keys in the documents being transferred are not valid for the OpenSearch target. Invalid keys are empty keys and keys that:

Consist of spaces.
Consist of periods.
Have a period at the beginning or end.
Have two or more periods in a row.
Include periods separated by spaces.

Solution:

In the target endpoint additional settings, enable Sanitize documents keys and reactivate the transfer.

Document duplication on the target

When repeatedly transferring data, documents get duplicated on the target.

All documents transferred from the same source table end up under the same index named <schemaName.tableName> on the target. In which case the target automatically generates document IDs (_id) by default. As a result, identical documents get different IDs and get duplicated.

There is no duplication if the primary keys are specified in the source table or endpoint conversion rules. Document IDs are then generated at the transfer stage using the primary key values.

Generation is performed as follows:

If the key value contains a period (.), it is escaped with \: some.key --> some\.key.
All the primary key values are converted into a string: <some_key1>.<some_key2>.<...>.
The resulting string is converted by the url.QueryEscape function.
If the resulting string does not exceed 512 characters in length, it is used as the _id. If longer than 512 characters, it is hashed with SHA-1, and the resulting hash is used as the _id.

As a result, documents with the same primary keys will receive the same ID when the data is transferred again, and the document transferred last will overwrite the existing one.

Solution:

Set the primary key for one or more columns in the source table or in the endpoint conversion rules.
Run the transfer.

Exceeding the maximum number of fields limit

Error message:

Limit of total fields [<limit_value>] has been exceeded

The transfer will be interrupted if the number of columns in the source database exceeds the maximum number of fields in the target database OpenSearch indexes.

Solution: Increase the maximum field number in the target database using the index.mapping.total_fields.limit parameter.

Transfer failure with the mapper_parsing_exception error

Error message:

mapper_parsing_exception failed to parse field [details.tags] of type [text]

The transfer is aborted due to incompatible data types at source and target.

Solution: Move the data to a new OpenSearch index with the details field type changed to flat_object.

Deactivate the transfer.

Create a new index in OpenSearch:

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/<new_index_name>/_settings' \
--data '{"index.mapping.total_fields.limit": 2000}'

Change the details field type:

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/<new_index_name>/_mapping' \
--data '
    {
        "properties": {
            "details": {
                "type": "flat_object"
            }
        }
    }'

Move the data from the source index to the new one:

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request POST 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_reindex' \
--data '
    {
    "source":{
        "index":"<source_index_name>"
    },
    "dest":{
        "index":"<new_index_name>"
    }
    }'

Delete the source index:

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request DELETE 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/<source_index_name>'

Assign an alias to the new index:

curl \
--user <OpenSearch_username>:<password> \
--header 'Content-Type: application/json' \
--request POST 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_aliases' \
--data '
    {
    "actions": [
        {
        "add": {
            "index": "<new_alias_name>",
            "alias": "<source_alias_name>"
        }
        }
    ]
    }'

SSL is required error

This error occurs when connecting to a Managed Service for OpenSearch cluster as a custom installation via a OpenSearch host's FQDN if SSL is not enabled in the endpoint settings. By default, Managed Service for OpenSearch clusters require SSL encryption for connections via host FQDNs.

This error may also occur if you are connecting to a custom OpenSearch installation that requires SSL.

Solution:

Enable SSL in the endpoint settings.

For MDB clusters and other sources that use certificates issued by public CAs, you do not usually need to upload a CA certificate.

If your source uses a self-signed certificate, upload your CA certificate to the relevant field in the endpoint settings.

No tables found

Error message:

Unable to find any tables

This error may occur if the source has no available indexes or the specified user has no permissions for the indexes.

Solution:

Check if there is an index. Make sure that the index name was specified correctly and that the source really has the index you want to transfer.
Make sure the user has the required permissions to use the index.

Transferring data to a OpenSearch target endpoint

Scenarios for transferring data to OpenSearchScenarios for transferring data to OpenSearch

Configuring the data sourceConfiguring the data source

Preparing the target databasePreparing the target database

Configuring the OpenSearch target endpointConfiguring the OpenSearch target endpoint

Managed Service for OpenSearch clusterManaged Service for OpenSearch cluster

Custom installationCustom installation

Additional settingsAdditional settings

Troubleshooting data transfer issuesTroubleshooting data transfer issues

Transfer failureTransfer failure

Document duplication on the targetDocument duplication on the target

Exceeding the maximum number of fields limitExceeding the maximum number of fields limit

Transfer failure with the mapper_parsing_exception errorTransfer failure with the mapper_parsing_exception error

SSL is required errorSSL is required error

No tables foundNo tables found

Was the article helpful?

Scenarios for transferring data to OpenSearch

Configuring the data source

Preparing the target database

Configuring the OpenSearch target endpoint

Managed Service for OpenSearch cluster

Custom installation

Additional settings

Troubleshooting data transfer issues

Transfer failure

Document duplication on the target

Exceeding the maximum number of fields limit

Transfer failure with the mapper_parsing_exception error

SSL is required error

No tables found