Transferring data to a Managed Service for YDB target endpoint
Yandex Data Transfer enables you to migrate data to a Managed Service for YDB database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Prepare the Managed Service for YDB database for the transfer.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to Managed Service for YDB
-
Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
-
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
Preparing the target database
- Create a service account with the
ydb.editor
role. - For the database running in Dedicated mode, create and configure a security group in the network hosting the DB.
Configuring the Managed Service for YDB target endpoint
When creating or updating an endpoint, you can define:
- Yandex Managed Service for YDB DB connection settings. These are required parameters.
- Additional parameters.
Yandex Managed Service for YDB cluster
Warning
To create or edit an endpoint of a managed database, you need to have the ydb.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Database: Select a Managed Service for YDB database from the list.
-
Service account ID: Select or create a service account with the
ydb.editor
role that Data Transfer will use to connect to the database. -
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
-
-
Cleanup policy: Select a way to clean up data in the target database before the transfer:
-
Drop
: Fully delete tables included in the transfer (used by default).Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
Don't cleanup
: Do not clean up.Select this option if only replication without copying data is performed.
-
Additional settings
-
Number of shards: Specify the required
N
number of shards.If the setting is specified, the
_shard_col
column is added to tables. The values in it are calculated as the remainder ofH/N
, whereH
is the result of the hash function at the current time andN
is the number of shards specified by the setting. -
Compression for default column group: Set the
COMPRESSION
setting for the default column group (FAMILY default). -
Sub directory for tables: Specify the subdirectory
to place tables in. -
Table rotation:
-
Size unit: Hour, day, or month.
-
Table size: In the selected units.
When the time interval equal to the selected unit of measurement ends, the oldest database table will be deleted and a new one will be created.
-
Number of tables: Required number of tables in the target database.
-
Partition by column: Split (partition) a table by the specified column's values. The column must be of the time type.
For more information about table partitioning, see the Yandex Managed Service for YDB
documentation.
If this setting is used, the specified number of data tables for different intervals is created in the target database. The name of each table is selected automatically by the date and time of the start of the interval. Depending on the values in the specified column of the source table, the original rows are distributed across the respective tables in the target database.
-
-
Renaming tables: Fill it in if you need to rename tables in the source database when transferring data to the target database.
-
Create OLAP tables: Select this option to create column-oriented OLAP tables. By default, row-oriented OLTP tables are used.
After configuring the data source and target, create and start the transfer.
Troubleshooting data transfer issues
Known issues when using a Managed Service for YDB endpoint:
Transfer failure
A Replication or Snapshot and increment transfer is interrupted with an error.
Error message:
/Ydb.PersQueue.V1.PersQueueService/AddReadRule failed: OVERLOADED
Transfers are aborted due to the cloud quota
Solution:
- Increase the Number of schema transactions per minute property in the Managed Service for YDB quotas for the cloud with the required database and reactivate the transfer.
See a full list of recommendations in the Troubleshooting section.