Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Transfer
  • Available transfers
  • Getting started
    • All guides
    • Preparing for a transfer
      • Managing endpoints
      • Migrating endpoints to a different availability zone
        • Source
    • Managing transfer process
    • Working with databases during transfer
    • Monitoring transfer status
  • Troubleshooting
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials

In this article:

  • Scenarios for transferring data from S3
  • Preparing the S3 database
  • Settings
  • Configuring the data target
  1. Step-by-step guides
  2. Configuring endpoints
  3. S3
  4. Source

Transferring data from an S3 source endpoint

Written by
Yandex Cloud
Updated at January 21, 2025
  • Scenarios for transferring data from S3
  • Preparing the S3 database
  • Settings
  • Configuring the data target

Yandex Data Transfer enables you to migrate data from S3 storage to Yandex Cloud managed databases and implement various data processing and transformation scenarios. To implement a transfer:

  1. Explore possible data transfer scenarios.
  2. Prepare the S3 database for the transfer.
  3. Set up a source endpoint in Yandex Data Transfer.
  4. Set up one of the supported data targets.
  5. Create a transfer and start it.
  6. In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data from S3Scenarios for transferring data from S3

You can implement scenarios for data migration and delivery from the Amazon Simple Storage Service (S3) storage to managed databases for further storage in the cloud, processing and loading into data marts for further visualization.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Preparing the S3 databasePreparing the S3 database

If you are using a private bucket as a source, grant the read and list permissions to the account you will use for connection.

For more information, see the Airbyte® documentation.

SettingsSettings

When creating or updating an endpoint, configure access to S3-compatible storage.

Management console
  • Dataset: Specify the name of an auxiliary table that will be used for the connection.

  • Path Pattern: Enter the path pattern. If the bucket contains nothing but files, use the ** value.

  • Schema: Specify the JSON schema in {"<column>": "<data_type>"} format. Use the {} value for automatic schema detection based on files.

  • format: Select the format matching your files: CSV, parquet, Avro, or JSON Lines.

    • CSV: Specify the settings of CSV files:

      • Delimiter: Delimiter character.
      • Quote char: Character used to escape reserved characters.
      • Escape char: Character used to escape special characters.
      • Encoding: Encoding.
      • Double quote: Enable this option to replace double quotes with single quotes.
      • Newlines in values: Enable the option if your text data values might include newline characters.
      • Block size: Size of a data chunk used to read data from files, in bytes.
      • Additional reader options: Required CSV ConvertOptions to edit, which are specified as a JSON-string.
      • Advanced options: Required CSV ReadOptions to edit, which are specified as a JSON-string.
    • parquet: Specify parquet-files settings:

      • Buffer size: Size of the buffer used to deserialize specific parts of columns.
      • Columns: Columns for reading data. Leave this field empty to read all the columns.
      • Batch size: Maximum number of records in a batch.
    • JSON Lines: Specify the settings for JSON Lines:

      • Allow newlines in values: Enable this option to allow newlines in JSON values. This may affect the transfer speed.
      • Unexpected field behavior: Specify how to handle JSON fields outside the explicit_schema (if the field values are set). For more information, see the PyArrow documentation.
      • Block Size: Specify the block size (in bytes) from each file to be handled in-memory simultaneously. If the value you set is too large, the Out of memory error may occur during the transfer.
  • S3: Amazon Web Services: Specify the S3 provider's settings:

    • Bucket: Bucket name.
    • Access Key ID and Secret Access Key: ID and contents of the AWS key used to access a private bucket.
    • (Optional) Path prefix: Prefix for folders and files not to be processed by AWS.
    • (Optional) Endpoint: Services to use that are not compatible with Amazon S3. Leave this field empty to use the Amazon service.
    • Use SSL: Enable to use custom servers over HTTPS. It is ignored when using the Amazon service.
    • Verify SSL certificate: Enable to skip authentication of the server's SSL certificate. This setting is useful if you use self-signed certificates. It is ignored when using the Amazon service.

For more information about the settings, see the Airbyte® documentation.

Airbyte® is a registered trademark of Airbyte, Inc in the United States and/or other countries.

Configuring the data targetConfiguring the data target

Configure one of the supported data targets:

  • MySQL®
  • MongoDB
  • ClickHouse®
  • Greenplum®
  • Yandex Managed Service for YDB
  • Apache Kafka®
  • YDS
  • PostgreSQL

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

Make sure that the network hosting the target cluster is configured to allow connections from the internet. To enable internet access, set up routing.

After configuring the data source and target, create and start the transfer.

Was the article helpful?

Previous
Target
Next
Source
© 2025 Direct Cursus Technology L.L.C.