Transferring data to an Yandex Object Storage target endpoint
Yandex Data Transfer enables you to migrate data to the Yandex Object Storage storage and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the storage and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to Yandex Object Storage
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
-
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- PostgreSQL
- MySQL
- MongoDB
- Apache Kafka®
- Airbyte®
- YDS
- Oracle
- Managed Service for YDB
- Elasticsearch
- OpenSearch
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
Configuring the Object Storage target endpoint
When creating or updating an endpoint, you can configure access to a Yandex Object Storage bucket.
-
Bucket: Name of the bucket to upload source data to.
-
Folder name: Object folder name. It supports the data layout pattern by date, e.g.,
2006/01/02/<folder_name>
. -
Service account: Service account with the
storage.uploader
role that will be used to access Yandex Data Streams. -
Serialization format: Format in which the data will be written to the bucket:
JSON
,CSV
,PARQUET
, orRaw data
. For more information, see Serialization at data delivery to Object Storage. -
Convert complex data to strings: Conversion of complex values to strings for
JSON
output format. -
Encoding format: Compression of output data (
Gzip
orUncompressed
).
Advanced settings
-
Buffer size: Size of files the data will be split into.
-
Flush interval: Time after which the file will be written, regardless of its size.
-
Time zone: Time zone according to which time the files are distributed. Only affects the distribution of files to folders in the bucket, but does not affect the data within the files.
-
Row time column name: Column name to specify logical time for the data. The default value is the system recording time. When recording data to the target, the time is converted to UTC. This behavior cannot be changed.
After configuring the data source and target, create and start the transfer.