Migrating data from Elasticsearch
Note
Yandex Managed Service for Elasticsearch is unavailable as of April 11, 2024.
There are three mechanisms to migrate data from a source Elasticsearch cluster to a target Yandex Managed Service for OpenSearch cluster:
-
This method is good for any Elasticsearch cluster.
For an example of this kind of migration, see Migrating data to OpenSearch using Yandex Data Transfer.
-
Snapshots
This method is good for Elasticsearch cluster versions 7.11 or lower.
For more information about snapshots, see the OpenSearch documentation
. -
Remote reindexing
(reindex data).You can use this mechanism to move your existing indexes, aliases, or data streams. This method is good for all Elasticsearch clusters of version 7.
Migration using snapshots
To migrate data from a source cluster in Elasticsearch to a target cluster in Managed Service for OpenSearch using snapshots:
- Create a snapshot in the source cluster.
- Restore the snapshot in the target cluster.
- Complete your migration.
If you no longer need the resources you are using, delete them.
Getting started
Prepare the infrastructure
-
Create an Object Storage bucket with restricted access. This bucket will be used as a snapshot repository.
-
Create a service account and assign it the
storage.editor
role. A service account is required to access the bucket from the source and target clusters.- If you are transferring data from a third-party Elasticsearch cluster, create a static access key for this service account.
Warning
Save the key ID and secret key. You will need them in the next steps.
-
Create a target Managed Service for OpenSearch cluster in desired configuration with the following settings:
- Plugin:
repository-s3
. - Public access to a group of
DATA
hosts.
- Plugin:
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the es-mos-migration-snapshot.tf
configuration file to the same working directory. The file describes:- Network.
- Subnet.
- Security group and rules required to connect to a Managed Service for OpenSearch cluster.
- Service account to work with the Object Storage bucket.
- Object Storage bucket.
- Managed Service for OpenSearch target cluster.
-
In the
es-mos-migration-snapshot.tf
file, specify these variables:folder_id
: Cloud folder ID, same as in the provider settings.bucket_name
: Bucket name consistent with the naming conventions.os_admin_password
: OpenSearch admin password.os_version
: OpenSearch version.
-
Check that the Terraform configuration files are correct using this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Complete the configuration and check access to resources
-
- In the Select a user drop-down list, specify the created service account.
- Select the
READ and WRITE
permissions for the selected service account. - Click Add.
- Click Save.
-
Set up the Elasticsearch source cluster:
-
Install the plugin
repository-s3
on all cluster hosts. -
For the
repository-s3
plugin to work, restart the Elasticsearch and Kibana services on all cluster hosts. -
Make sure the Elasticsearch source cluster can access the internet.
-
-
Make sure you can connect to the target Managed Service for OpenSearch cluster using the OpenSearch API and Dashboards.
Create a snapshot on the source cluster
-
Connect the bucket as a snapshot repository on the source cluster:
-
Add the static access key information to the Elasticsearch keystore
(keystore).Note
Run the procedure on all hosts of the source cluster.
Add the following:
-
Key ID:
$ES_PATH/bin/elasticsearch-keystore add s3.client.default.access_key
-
Secret key:
$ES_PATH/bin/elasticsearch-keystore add s3.client.default.secret_key
Note
The path to Elasticsearch (
$ES_PATH
) depends on the selected installation method. To find a path to your Elasticsearch installation, see the installation documentation (for example, for DEB , RPM ). -
-
Upload the data from the keystore:
curl --request POST "https://<IP_or_FQDN_of_source_cluster_DATA_host>:9200/_nodes/reload_secure_settings"
-
Register the repository:
curl --request PUT \ "https://<IP_or_FQDN_of_source_cluster_DATA_host>:9200/_snapshot/<repository_name>" \ --header 'Content-Type: application/json' \ --data '{ "type": "s3", "settings": { "bucket": "<bucket_name>", "endpoint": "storage.yandexcloud.net" } }'
For more information about connecting the repository, see the plugin documentation
.Alert
If a bucket is registered in an Elasticsearch cluster as a snapshot repository, do not edit the bucket contents manually as this will disrupt the Elasticsearch snapshot mechanism.
-
-
Run the snapshot creation in the repository created in the previous step. You can create a snapshot of the entire cluster or some of the data. For more information, see the Elasticsearch documentation
.Example of creating a snapshot named
snapshot_1
for the entire cluster:curl --request PUT \ "https://<IP_or_FQDN_of_the_DATA_host_in_the_source_cluster>:9200/_snapshot/<repository_name>/snapshot_1?wait_for_completion=false&pretty"
Creating a snapshot may take a long time. Track the progress of the operation using Elasticsearch tools
, such as:curl --request GET \ "https://<IP_or_FQDN_of_the_DATA_host_in_the_source_cluster>:9200/_snapshot/<repository_name>/snapshot_1/_status?pretty"
Restore a snapshot on the target cluster
-
Configure access to the bucket with snapshots for the target cluster. Use the service account you previously created.
-
Attach an Object Storage bucket to the target cluster. This bucket will be used as a read-only snapshot storage:
curl --request PUT \ "https://admin:<admin_user_password>@<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_snapshot/<repository_name>" \ --cacert ~/.opensearch/root.crt \ --header 'Content-Type: application/json' \ --data '{ "type": "s3", "settings": { "bucket": "<bucket_name>", "readonly" : "true", "endpoint": "storage.yandexcloud.net" } }'
-
Select how to restore an index on the target cluster.
With the default settings, an attempt to restore an index will fail in a cluster where the same-name index is already open. Even in Managed Service for OpenSearch clusters without user data, there are open system indexes (such as
.apm-custom-link
or.kibana_*
, etc.), which may interfere with the restore operation. To avoid this, use one of the following methods:-
Migrate only your custom indexes. The existing system indexes are not migrated. The import process only affects the user-created indexes on the source cluster.
-
Use the
rename_pattern
andrename_replacement
parameters. Indexes will be renamed as they are restored. For more information, see the OpenSearch documentation .
Example of restoring the entire snapshot:
curl --request POST \ "https://admin:<admin_user_password>@<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_snapshot/<repository_name>/snapshot_1/_restore" \ --cacert ~/.opensearch/root.crt
-
-
Start restoring data from the snapshot on the target cluster.
Example of restoring a snapshot with indication of the custom indexes to be restored on the target cluster:
curl --request POST \ "https://admin:<admin_user_password>@<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_snapshot/<repository_name>/snapshot_1/_restore?wait_for_completion=false&pretty" \ --cacert ~/.opensearch/root.crt \ --header 'Content-Type: application/json' \ --data '{ "indices": "<list_of_indexes>" }'
Where
indices
is a list of comma-separated indexes to restore, e.g.,my_index*, my_index_2.*
.Restoring a snapshot may take a long time. To check the restoring status, run this command:
curl --request GET \ "https://admin:<admin_user_password>@<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_snapshot/<repository_name>/snapshot_1/_status?pretty" \ --cacert ~/.opensearch/root.crt
Complete your migration
-
Make sure all the indexes you need have been transferred to the target Managed Service for OpenSearch cluster, and the number of documents in them is the same as in the source cluster:
BashOpenSearch DashboardsRun this command:
curl \ --user <username_in_target_cluster>:<user_password_in_target_cluster> \ --cacert ~/.opensearch/root.crt \ --request GET 'https://<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_cat/indices?v'
The list should contain the indexes transferred from Elasticsearch with the number of documents specified in the
docs.count
column.- Connect to the target cluster using OpenSearch Dashboards.
- Select the
Global
tenant. - Open the control panel by clicking
. - Under OpenSearch Plugins, select Index Management.
- Go to Indexes.
The list should contain the indexes transferred from Elasticsearch with the number of documents specified in the Total documents column.
-
If necessary, disable the snapshot repository
on the side of the source and target clusters.
Delete the resources you created
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
- Delete the service account.
- Delete snapshots from the bucket and then delete the entire bucket.
- Delete the Managed Service for OpenSearch cluster.
- Delete all objects from the bucket.
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy
-
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-
Migration using reindexing
To migrate data from a source cluster in Elasticsearch to a target cluster in Managed Service for OpenSearch through reindexing:
If you no longer need the resources you created, delete them.
Getting started
-
Prepare the infrastructure:
ManuallyUsing TerraformCreate a Managed Service for OpenSearch target cluster in desired configuration with public access to a group of hosts with the
DATA
role.-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the es-mos-migration-reindex.tf
configuration file to the same working directory. The file describes:- Network.
- Subnet.
- Security group and rules required to connect to a Managed Service for OpenSearch cluster.
- Managed Service for OpenSearch target cluster.
-
In the
es-mos-migration-reindex.tf
file, specify these variables:os_admin_password
: OpenSearch admin password.os_version
: OpenSearch version.
-
Check that the Terraform configuration files are correct using this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
-
-
Install an SSL certificate:
Linux (Bash)/macOS (Zsh)Windows (PowerShell)mkdir -p ~/.opensearch && \ wget "https://storage.yandexcloud.net/cloud-certs/CA.pem" \ --output-document ~/.opensearch/root.crt && \ chmod 0600 ~/.opensearch/root.crt
The certificate will be saved to the
~/.opensearch/root.crt
file.mkdir $HOME\.opensearch; curl --output $HOME\.opensearch\root.crt https://storage.yandexcloud.net/cloud-certs/CA.pem
The certificate will be saved to the
$HOME\.opensearch\root.crt
file.Corporate policies and antivirus software can block the download of certificates. For more information, see FAQ.
-
Make sure you can connect to the target Managed Service for OpenSearch cluster using the OpenSearch API and Dashboards.
-
Make sure the Elasticsearch source cluster can access the internet.
-
Create a user
with themonitoring_user
and theviewer
roles in the target cluster.
Configure the target cluster
-
Create a role
with thecreate_index
andwrite
privileges for all indexes (*
). -
Create a user and assign this role to them.
Tip
In Managed Service for OpenSearch clusters, you can run re-indexing as the
admin
user with thesuperuser
role; however, it is more secure to create separate users with limited privileges for each job. For more information, see Managing OpenSearch users.
Start reindexing
-
Retrieve the list of hosts in the target cluster.
-
To start reindexing, run a request to the host with the
DATA
role in the target cluster:curl --user <username_in_target_cluster>:<user_password_in_target_cluster> \ --cacert ~/.opensearch/root.crt \ --request POST \ "https://<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_reindex?wait_for_completion=false&pretty" \ --header 'Content-Type: application/json' \ --data '{ "source": { "remote": { "host": "https://<IP_address_or_FQDN_of_host_with_DATA_role_in_source_cluster>:9200", "username": "<username_in_source_cluster>", "password": "<user_password_in_source_cluster>" }, "index": "<name_of_index_alias_or_data_stream_in_source_cluster>" }, "dest": { "index": "<name_of_index_alias_or_data_stream_in_target_cluster>" } }'
Result:
{ "task" : "<ID_of_reindexing_job>" }
To transfer several indexes, use a
for
loop:for index in <names_of_indexes_of_aliases_or_data_streams_separated_by_spaces>; do curl --user <username_in_target_cluster>:<user_password_in_target_cluster> \ --cacert ~/.opensearch/root.crt \ --request POST \ "https://<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_reindex?wait_for_completion=false&pretty" \ --header 'Content-Type: application/json' \ --data '{ "source": { "remote": { "host": "https://<IP_address_or_FQDN_of_host_with_DATA_role_in_source_cluster>:9200", "username": "<username_in_source_cluster>", "password": "<user_password_in_source_cluster>" }, "index": "'$index'" }, "dest": { "index": "'$index'" } }' done
Result:
{ "task" : "<ID_of_reindexing_job_1>" } { "task" : "<ID_of_reindexing_job_2>" } ...
To learn more about reindexing parameters, see the OpenSearch documentation
.Reindexing may take a long time. To check the operation status, run this command:
curl --user <username_in_target_cluster>:<user_password_in_target_cluster> \ --cacert ~/.opensearch/root.crt \ --request GET \ "https://<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_tasks/<ID_of_reindexing_job>"
-
To cancel reindexing, run this command:
curl --user <username_in_target_cluster>:<user_password_in_target_cluster> \ --cacert ~/.opensearch/root.crt \ --request POST \ "https://<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_tasks/<ID_of_reindexing_job>/_cancel"
Check the result
Make sure all the indexes you need have been transferred to the target Managed Service for OpenSearch cluster, and the number of documents in them is the same as in the source cluster:
Run this command:
curl \
--user <username_in_target_cluster>:<user_password_in_target_cluster> \
--cacert ~/.opensearch/root.crt \
--request GET 'https://<ID_of_OpenSearch_host_with_DATA_role>.mdb.yandexcloud.net:9200/_cat/indices?v'
The list should contain the indexes transferred from Elasticsearch with the number of documents specified in the docs.count
column.
- Connect to the target cluster using OpenSearch Dashboards.
- Select the
Global
tenant. - Open the control panel by clicking
. - Under OpenSearch Plugins, select Index Management.
- Go to Indexes.
The list should contain the indexes transferred from Elasticsearch with the number of documents specified in the Total documents column.
Delete the resources you created
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
-
Delete the objects from the bucket.
-
Delete the resources depending on how they were created:
ManuallyUsing Terraform-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy
-
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-
-
-
If you reserved public static IPs for cluster access, release and delete them.