Managing transfer process
You can:
- Get a list of transfers.
- Get detailed information about a transfer.
- Create a transfer.
- Update a transfer.
- Activate a transfer.
- Deactivate a transfer.
- Delete a transfer.
For more information about transfer states, possible operations on transfers, and existing limits, see Transfer types and lifecycles.
To move a transfer and endpoints to a different availability zone, follow this guide.
Getting a list of transfers
- Go to the folder page
and select Yandex Data Transfer. - In the left-hand panel, select
Transfers.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get a list of transfers in a folder, run the following command:
yc datatransfer transfer list
Use the list API method.
Getting detailed information about a transfer
- Go to the folder page
and select Yandex Data Transfer. - In the left-hand panel, select
Transfers. - Click the required transfer name.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get information about a transfer, run the following command:
yc datatransfer transfer get <transfer_ID>
You can get the transfer ID with a list of transfers in the folder.
Use the get API method and provide the transfer ID value in the transferId
request parameter.
To find out the transfer ID, get a list of transfers in the folder.
Creating a transfer
-
Go to the folder page
and select Yandex Data Transfer. -
In the left-hand panel, select
Transfers. -
Click Create transfer.
-
Select the source endpoint or create a new one.
-
Select the target endpoint or create a new one. Make sure the subnet specified for the target endpoint belongs to the same availability zone as the subnet for the source endpoint.
-
Specify the transfer parameters:
-
Name.
-
(Optional) Description.
-
Transfer type:
- Snapshot: Creates a full copy of data without receiving further updates from the source.
-
Periodic snapshot: Enable it to create a full copy of data at certain time intervals.
- Period: Select a copy interval from the list. The transfer will be regularly run at the specified interval. For the first time, it will be run once the settings are saved. If you run the transfer manually, it will be run after the specified interval next time.
- Cron expression: Specify the copy run schedule in cron format. The time is provided for the UTC
time zone. - Wait for transaction completion time, in seconds: Specify the delay for completing current transactions.
-
Incremental tables: Specify the tables whose data is copied incrementally, i.e., from where the copy process stopped previously; set values for the Schema, Table, Key column, and Initial value (optional) fields. For more information, see Regular incremental copy.
Note
This is more efficient than copying entire tables but less efficient than using transfers of the Snapshot and increment type. This setting is available for PostgreSQL, ClickHouse®, and Airbyte® sources.
-
Snapshot settings → Parallel snapshot settings: Specify the number of workers and threads per worker required for parallel copy processes.
For more information on setting up workers and threads, see the recommendations for parallel copying.
- Replication: Allows you to receive data updates from the source and apply them to the target (without creating a full copy of the source data).
-
Replication settings → Parallel replication settings: Specify the number of workers required for parallel replication processes. This setting is available for the sources YDB, Apache Kafka®, and YDS. If multiple replication processes are run, they will share the partitions of the topic under replication.
Note
For YDB, we recommend setting the number of workers to a value not exceeding the total number of table partitions, or else some resources will be idle. If a custom changefeed is not specified, as soon as a transfer is activated it will create a changefeed with the number of partitions equal to the number of YDB table tablets as of the last activation.
-
- Snapshot and increment: Creates a full copy of the source data and keeps it up-to-date.
-
Snapshot settings → Parallel snapshot settings: Specify the number of workers and threads per worker required for parallel copy processes.
For more information on setting up workers and threads, see the recommendations for parallel copying.
-
-
- Snapshot: Creates a full copy of data without receiving further updates from the source.
-
For billable source-target pairs at the GA stage, you can configure the amount of computing resources per VM in the Runtime environment settings section. Select one of the three suggested configurations:
- 2 vCPUs and 4 GB RAM. This is the default configuration.
- 4 vCPUs and 8 GB RAM.
- 8 vCPUs and 16 GB RAM.
The VM resource configuration determines the performance of the data transfer workers. A separate VM is allocated for each worker. For vCPU and RAM pricing policy, calculation examples, and cost optimization recommendations, see Pricing policy.
-
(Optional) List of objects for transfer: Specify the full path to each object to transfer. Only objects from this list will be transferred. If you have listed included tables or collections in the source endpoint settings, only objects that are on both these lists will be transfered. If you specify objects not listed among included tables or collections in the source endpoint settings, transfer activation will end with the
$table not found in source
error. This setting is not available for such sources as Apache Kafka®, and YDS.Enter the full name of the object. Depending on the source type, use the appropriate naming convention:
- ClickHouse®:
<database_name>.<table_path>
- Greenplum®:
<schema_name>.<table_path>
- MongoDB:
<database_name>.<collection_path>
- MySQL®:
<database_name>.<table_path>
- PostgreSQL:
<schema_name>.<table_path>
- YDB: table path
- Oracle:
<schema_name>.<table_path>
.
If the specified object is on the excluded table or collection list in the source endpoint settings, or the object name was entered incorrectly, the transfer will end with an error. A running Replication or Snapshot and increment transfer will terminate immediately; an inactive one will terminate as soon as activated.
- ClickHouse®:
-
(Optional) Data transformation: Data transformation rules. This setting only appears when the source and target are of different types.
-
-
Click Create.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a transfer:
-
View a description of the CLI create transfer command:
yc datatransfer transfer create --help
-
Specify the transfer parameters in the create command:
yc datatransfer transfer create <transfer_name> \ --source-id=<source_endpoint_ID> \ --target-id=<target_endpoint_ID> \ --type=<transfer_type>
Where:
--source-id
: Source endpoint ID.--target-id
: Target endpoint ID.--type
: Transfer type:snapshot-only
: Copy.increment-only
: Replicate.snapshot-and-increment
: Copy and replicate.
Note
The transfer name must be unique within the folder. It may contain Latin letters, numbers, and hyphens. The name may be up to 63 characters long.
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the documentation on the Terraform
If you don't have Terraform, install it and configure the Yandex Cloud provider.
To create a transfer:
-
Create a configuration file with a description of your transfer.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_transfer" "<transer_name_in_Terraform>" { folder_id = "<folder_ID>" name = "<transfer_name>" description = "<transfer_description>" source_id = "<source_endpoint_ID>" target_id = "<target_endpoint_ID>" type = "<transfer_type>" }
The available transfer types include:
SNAPSHOT_ONLY
: SnapshotINCREMENT_ONLY
: ReplicationSNAPSHOT_AND_INCREMENT
: Snapshot and increment
-
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform provider documentation
When created, INCREMENT_ONLY
and SNAPSHOT_AND_INCREMENT
transfers are activated and run automatically.
If you want to activate a SNAPSHOT_ONLY
transfer once it is created, add the provisioner "local-exec"
section with the following transfer activation command to the configuration file:
provisioner "local-exec" {
command = "yc --profile <profile> datatransfer transfer activate ${yandex_datatransfer_transfer.<transfer_Terraform_resource_name>.id
}
In this case, copying will only take place once at the time of transfer creation.
Use the create API method and include the following information in the request:
- ID of the folder where the transfer should be placed, in the
folderId
parameter. - Transfer name in the
name
parameter. - Source endpoint ID in the
sourceId
parameter. - Target endpoint ID in the
targetId
parameter. - Transfer type in the
type
parameter.
Updating a transfer
-
Go to the folder page
and select Yandex Data Transfer. -
In the left-hand panel, select
Transfers. -
Select a transfer and click
Edit in the top panel. -
Edit the transfer parameters:
-
Name.
-
Description.
-
For the Snapshot transfer type:
-
Periodic snapshot: Enable it to create a full copy of data at certain time intervals.
- Period: Select a copy interval from the list. The transfer will be regularly run at the specified interval. For the first time, it will be run once the settings are saved. If you run the transfer manually, it will be run after the specified interval next time.
- Cron expression: Specify the copy run schedule in cron format. The time is provided for the UTC
time zone. - Wait for transaction completion time, in seconds: Specify the delay for completing current transactions.
-
Incremental tables: Specify the tables whose data is copied incrementally, i.e., from where the copy process stopped previously; set values for the Schema, Table, Key column, and Initial value (optional) fields. For more information, see Regular incremental copy.
Note
This is more efficient than copying entire tables but less efficient than using transfers of the Snapshot and increment type. This setting is available for PostgreSQL, ClickHouse®, and Airbyte® sources.
-
Snapshot settings → Parallel snapshot settings: Specify the number of workers and threads per worker required for parallel copy processes.
For more information on setting up workers and threads, see the recommendations for parallel copying.
-
-
For the Replication transfer type:
-
Replication settings → Parallel replication settings: Specify the number of workers required for parallel replication processes. This setting is available for the sources YDB, Apache Kafka®, and YDS. If multiple replication processes are run, they will share the partitions of the topic under replication.
Note
For YDB, we recommend setting the number of workers to a value not exceeding the total number of table partitions, or else some resources will be idle. If a custom changefeed is not specified, as soon as a transfer is activated it will create a changefeed with the number of partitions equal to the number of YDB table tablets as of the last activation.
-
-
For the Snapshot and increment transfer type:
-
Snapshot settings → Parallel snapshot settings: Specify the number of workers and threads per worker required for parallel copy processes.
For more information on setting up workers and threads, see the recommendations for parallel copying.
-
-
For billable source-target pairs at the GA stage, you can edit the amount of computing resources per VM in the Runtime environment settings section. Select one of the three suggested configurations:
- 2 vCPUs and 4 GB RAM. This is the default configuration.
- 4 vCPUs and 8 GB RAM.
- 8 vCPUs and 16 GB RAM.
The VM resource configuration determines the performance of the data transfer workers. A separate VM is allocated for each worker. For vCPU and RAM pricing policy, calculation examples, and cost optimization recommendations, see Pricing policy.
-
List of objects for transfer: Specify the full path to each object to transfer. Only objects from this list will be transferred. If you have listed included tables or collections in the source endpoint settings, only objects that are on both these lists will be transfered. If you specify objects not listed among included tables or collections in the source endpoint settings, transfer activation will end with the
$table not found in source
error. This setting is not available for such sources as Apache Kafka®, and YDS.Adding new objects to Snapshot and increment or Replication transfers in the Replicating status will result in uploading data history for these objects or tables. If a table is large, uploading the history may take a long time. You cannot edit the list of objects for transfers in the Copying status.
Enter the full name of the object. Depending on the source type, use the appropriate naming convention:
- ClickHouse®:
<database_name>.<table_path>
- Greenplum®:
<schema_name>.<table_path>
- MongoDB:
<database_name>.<collection_path>
- MySQL®:
<database_name>.<table_path>
- PostgreSQL:
<schema_name>.<table_path>
- YDB: table path
- Oracle:
<schema_name>.<table_path>
.
If the specified object is on the excluded table or collection list in the source endpoint settings, or the object name was entered incorrectly, the transfer will end with an error. A running Replication or Snapshot and increment transfer will terminate immediately; an inactive one will terminate as soon as activated.
- ClickHouse®:
-
(Optional) Data transformation: Data transformation rules. This setting only appears when the source and target are of different types.
-
-
Click Save.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To update the transfer settings:
-
View a description of the update transfer CLI command:
yc datatransfer transfer update --help
-
Run the following command with a list of settings to update:
yc datatransfer transfer update <transfer_ID> \ --name=<transfer_name> \ --description=<transfer_description>
You can get the transfer ID with a list of transfers in the folder.
-
Open the current Terraform configuration file with the transfer description.
For information on creating a transfer like this, please review Create transfer.
-
Edit the values in the
name
and thedescription
fields (transfer name and description). -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform provider documentation
Use the update API method and include the following in the request:
- Transfer ID in the
transferId
parameter. To find out the ID, get a list of transfers in the folder. - Transfer name in the
name
parameter. - Transfer description in the
description
parameter. - List of transfer configuration fields to update in the
updateMask
parameter.
Warning
The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the updateMask
parameter as a single comma-separated string.
When updating a transfer, its settings are applied immediately. Editing Snapshot and increment or Replication transfer settings with the Replicating status will result in restarting the transfer.
Activating a transfer
- Go to the folder page
and select Yandex Data Transfer. - In the left-hand panel, select
Transfers. - Click
next to the name of the transfer you need and select Activate.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To activate a transfer, run this command:
yc datatransfer transfer activate <transfer_ID>
You can get the transfer ID with a list of transfers in the folder.
Use the activate API method and provide the transfer ID in the transferId
request parameter.
To find out the transfer ID, get a list of transfers in the folder.
Note
The operation is available in the Yandex Cloud mobile app.
Deactivating a transfer
During transfer deactivation:
- The replication slot on the source is disabled.
- Temporary data transfer logs are deleted.
- The target is brought into the aligned state:
- The data schema objects of the source are transferred for the final stage.
- Indexes are created.
- Switch the source to
read-only
. - Go to the folder page
and select Yandex Data Transfer. - In the left-hand panel, select
Transfers. - Click
next to the name of the transfer you need and select Deactivate. - Wait for the transfer status to change to Stopped.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To deactivate a transfer, run this command:
yc datatransfer transfer deactivate <transfer_ID>
You can get the transfer ID with a list of transfers in the folder.
Use the deactivate API method and provide the transfer ID in the transferId
request parameter.
To find out the transfer ID, get a list of transfers in the folder.
Warning
Do not interrupt the deactivation of the transfer! If the process fails, the performance of the source and target is not guaranteed.
For more information, see Transfer types and lifecycles.
Note
The operation is available in the Yandex Cloud mobile app.
Deleting a transfer
- Go to the folder page
and select Yandex Data Transfer. - In the left-hand panel, select
Transfers. - If the transfer you need is active, deactivate it.
- Click
next to the name of the transfer you need and select Delete. - Click Delete.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To delete a transfer, run this command:
yc datatransfer transfer delete <transfer_ID>
You can get the transfer ID with a list of transfers in the folder.
To delete a transfer created using Terraform:
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about creating this file, see Creating a transfer.
-
Delete the transfer description.
-
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Type the word
yes
, then press Enter.-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform provider documentation
Use the delete API method and provide the transfer ID in the transferId
request parameter.
To find out the transfer ID, get a list of transfers in the folder.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.
ClickHouse® is a registered trademark of ClickHouse, Inc