Transferring data to a Yandex Object Storage target endpoint
Yandex Data Transfer enables you to migrate data to the Yandex Object Storage storage and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the storage and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to Yandex Object Storage
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
-
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.
For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- PostgreSQL
- MySQL®
- MongoDB
- Apache Kafka®
- Airbyte®
- YDS
- Oracle
- Managed Service for YDB
- Elasticsearch
- OpenSearch.
For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.
Warning
Object Storage only supports inserting new data but not updating it. If there is a data update going on in the source, it should not be used to supply data to Object Storage; otherwise, the transfer will fail with an error.
Configuring the Object Storage target endpoint
When creating or updating an endpoint, you can define:
- Configuration settings for an Yandex Object Storage bucket or custom S3-compatible storage.
- Additional parameters.
Bucket configurations
- Bucket: Name of the bucket to upload source data to.
- Service account: Service account with the
storage.uploader
role that will be used to access Yandex Data Streams.
- Bucket: Bucket name.
- AWS Access Key ID and AWS Secret Access Key: ID and contents of the AWS key
used to access a private bucket. - (Optional) Endpoint: Endpoint for an Amazon S3-compatible service. Leave this field empty to use Amazon.
- Region: Region to send requests.
- Use SSL: Select this option if the remote server uses a secure SSL/TLS connection.
- Verify SSL certificate: Allow self-signed certificates.
Additional settings
-
Serialization format: Format in which the data will be written to the bucket:
JSON
,CSV
,PARQUET
, orRaw data
. For more information, see Serialization at data delivery to Object Storage. -
Convert complex data to strings: Conversion of complex values to strings for
JSON
output format. -
Encoding format: Compression of output data (
Gzip
orUncompressed
). -
Buffer size: Size of files the data will be split into.
-
Flush interval: Time after which the file will be written, regardless of its size.
-
Folder name: Object folder name. It supports the data layout pattern by date, Here is an example:
2006/01/02/<folder_name>
. -
Time zone: Time zone according to which time the files are distributed. Only affects the distribution of files to folders in the bucket, but does not affect the data within the files.
-
Row time column name: Column name to specify logical time for the data. The default value is the system recording time. When recording data to the target, the time is converted to UTC. This behavior cannot be changed.
After configuring the data source and target, create and start the transfer.
Troubleshooting data transfer issues
For more troubleshooting tips, see Troubleshooting.
Source data update error
Error message:
Push failed: kind: update not supported
Object Storage only supports inserting new data but does not support updating it. If data is updated at the source, the transfer will fail with the above error.
Solution: Use sources supporting data insertion only or select a target other than Object Storage.