Transferring data to a YTsaurus target endpoint
Yandex Data Transfer enables you to migrate data to YTsaurus and implement various data transfer, processing, and transformation scenarios.
There are two types of target endpoints available for YTsaurus:
YTSaurus Dynamic: Writing data to dynamic tables .YTSaurus Static: Writing data to static tables .
To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform the required operations with the database and see how the transfer is going.
Scenarios for transferring data to YTsaurus using Yandex Data Transfer
You can implement scenarios for loading data from tables into Yandex Cloud managed databases for storage in the cloud, processing, and loading into data marts for visualization.
For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- Apache Kafka® (transfer is only possible to a
YTSaurus Dynamictarget) - ClickHouse®
- Greenplum®
- MongoDB
- MySQL®
- PostgreSQL
- Yandex Object Storage
- Oracle
- Managed Service for YDB
For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.
Preparing the target database
-
To transfer to static tables, grant permissions to create and write to tables (
writepermission for the directory with tables). If data cleanup is performed before the transfer, grant theremovepermission. To perform a transfer, you will also need theusepermission for the account associated with the directory the tables will be in. -
To transfer to dynamic tables, in addition to the
write,remove, andusepermissions, grant the permission tomounttables.
Configuring the YTsaurus target endpoint
When creating or updating an endpoint, you can define:
- Settings for connecting to a Yandex Managed Service for YTsaurus cluster. These are required parameters.
- Additional settings.
Supported data delivery schemas and limits
| Data delivery schemas | Support level | Constraints |
|---|---|---|
| Delivering data to static tables | Without data transformation (sharding, rotation, or splitting tables into subtables) | |
| Parallel copy of data to static tables | Under development. | |
| Delivering data to dynamic tables through static ones | All intermediate operations on table parts are run as non-transactional and users can see them.Disabled and Drop cleanup policy limits 1. |
|
| Parallel copy of data to dynamic tables through static ones | All intermediate operations on table parts are run as non-transactional and users can see them.Disabled and Drop cleanup policy limits 1. |
1 Cleanup policy limits:
- If the
Disabledcleanup policy is used, there is no guarantee that new data in existing tables will take precedence over the old should the keys overlap. - With the
Droppolicy, the old tables are cleaned up before parts of new tables are added to them. We will fix that.
Warning
For dynamic tables, primary keys in the data are a requirement. YTsaurus dynamic tables store data in key:value format, where the key and its associated value must also be specified. If there is no non-key column value, the __dummy non-key stub column will be created. If there is no primary key, the transfer will end with an error.
Managed Service for YTsaurus cluster
-
Service account ID: Select or create a service account with the
managed-ytsaurus.editorrole that Data Transfer will use to connect to the cluster. -
Cluster ID: Select the cluster to connect to.
-
Security groups: Select the following:
- Cloud network for hosting the endpoint.
- Security groups for network traffic.
Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the cluster. Learn more in Networking in Yandex Data Transfer.
-
Path: Path to the folder to write the transferred data to.
-
Cleanup policy: Select a way to clean up data in the target database before the transfer:
-
Drop: Fully delete tables included in the transfer (default).Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
Disabled: Do not clean.Select this option if only replication without copying data is performed.
-
Advanced settings
Settings for the YTSaurus Dynamic database type
-
Table settings:
-
Medium: Select the type of storage
for your data:HDD (primary_medium=default): Multiple HHD disks in a cluster.SSD (primary_medium=ssd_blobs): Multiple SSD disks in a cluster.SSD for logs (primary_medium=ssd_journals): Multiple SSD disks to store dynamic table logs.RAM (primary_medium=in_memory): Dedicated space in cluster node RAM.
-
Chunk format: Select the format for storing data in the chunk
:Columnar (optimize_for=scan): To optimize scanning.Line-by-line (optimize_for=lookup): To optimize search.
-
Atomic transactions: Enable this if you want transactions for tables to be completely atomic
(atomicity=fullproperty). -
TTL: Specify a table data storage period. After this period, the data will be permanently deleted.
-
Custom attributes: Allows you to add custom attributes to tables created in YSON
format. To add a new attribute, click + Attribute and enter its name and value.
-
-
Write settings:
-
Disable schema migration: Select to prevent changes to the target data schema when the source schema is modified. By default, when the source schema is modified, the transfer will update the target schema accordingly: create new tables, add new columns, add new enumerated values and enumerated types. By default, changes like deleting tables and columns are not applied.
-
Discard large values: Enable it to ignore non-critical data that fails to comply with the limits. If the size of a column value does not comply with the YTsaurus
limitations, this value will be replaced withBigStringValueStub. -
Copy with static table: Select this option to perform copy operations via temporary static tables. For the
Dropcleanup policy, existing data in the target will be deleted once the copying is completed. If no drop mode is used, new and existing data will be merged. -
YT computing pool: Specify the computing pool
for operations on tables.
-
Settings for the YTSaurus Static database type
-
Table settings:
-
Chunk format: Select the format for storing data in the chunk
:Columnar (optimize_for=scan): To optimize scanning.Line-by-line (optimize_for=lookup): To optimize search.
- Sort static tables: Enable this option if you need to sort table records
by key. -
Custom attributes: Allows you to add custom attributes to tables created in YSON
format. To add a new attribute, click + Attribute and enter its name and value.
-
-
Write settings:
-
Discard large values: Enable it to ignore non-critical data that fails to comply with the limits. If the size of a column value does not comply with the YTsaurus
limitations, this value will be replaced withBigStringValueStub. -
YT computing pool: Specify the computing pool
for operations on tables.
-
After configuring the data source and target, create and start the transfer.