Resource relationships in Data Transfer
Yandex Data Transfer helps transfer data between DBMS, object storages, and message brokers. This way you can reduce the migration period and minimize downtime when switching to a new database.
Yandex Data Transfer is configurable via Yandex Cloud standard interfaces.
The service is suitable for creating a permanent replica of the database. The transfer of the database schema from the source to the target is automated.
Endpoint
An endpoint is a configuration used to connect to a service: datasource or target. In addition to connection settings, the endpoint may contain information about which data will be involved in the transfer and how it should be processed during the transfer.
The following can be the data source or target:
Service | Source | Target |
---|---|---|
Apache Kafka® topic: Your own or as part of the Managed Service for Apache Kafka® service | ||
AWS CloudTrail message stream | ||
Your own BigQuery database | ||
ClickHouse® database: Your own or as part of the Managed Service for ClickHouse® service | ||
Your own Elasticsearch database | ||
Greenplum® database: Your own or as part of the Managed Service for Greenplum® service | ||
MongoDB database: Your own or as part of the Managed Service for MongoDB service | ||
MySQL® database: Your own or as part of the Managed Service for MySQL® service | ||
Your own Oracle database | ||
PostgreSQL database: Your own or as part of the Managed Service for PostgreSQL service | ||
OpenSearch database: Your own or as part of the Managed Service for OpenSearch service | ||
S3-compatible bucket | ||
Yandex Data Streams data stream | ||
Managed Service for YDB database: As part of the Managed Service for YDB service | ||
Yandex Object Storage bucket | ||
Transfer
Transfer is the process of transmitting data between the source and target service. It should be in the same folder as the endpoints used.
If subnets are specified for endpoints, these subnets must be hosted in the same availability zone. Otherwise, activating the transfer with such endpoints will result in an error.
Worker
Worker is a utility process that starts a data transfer. A separate VM is allocated for each worker. You can specify which computing resources to use for this virtual machine:
- 2 vCPUs and 4 GB RAM. This is the default configuration.
- 4 vCPUs and 8 GB RAM.
- 8 vCPUs and 16 GB RAM.
During parallel copying or parallel replication (for the YDS, YDB, and Apache Kafka® sources), the user selects the number of workers to run at the same time.
vCPU count and RAM size impact the cost of Data Transfer resources. To optimize usage and data transfer costs, we recommend using workers efficiently by reducing their number and increasing the load on each worker. You can also change the worker configuration in the transfer settings for billable source-target pairs at the GA stage.
Transfer types
The following types of transfers are available:
- Snapshot: Transfers a snapshot of the source to the target. Apart from a one-time snapshot transfer, there are copy types, such as Regular and Regular incremental.
- Replication: Continuously receives changes from the source and applies them to the target. Initial data synchronization is not performed.
- Snapshot and increment: Transfers the current state of the source to the target and keeps it up-to-date.
For more information about the differences between transfer types, see Transfer types and lifecycles.
Compatibility of sources and targets
Possible source and target combinations:
Target Source |
PostgreSQL |
MySQL® |
MongoDB |
ClickHouse® |
Greenplum® |
YDB |
Object Storage |
Apache Kafka |
Data Streams |
Elasticsearch |
OpenSearch |
Target Source |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PostgreSQL |
CR |
CR | - | CR |
CR | CR | C | CR |
CR | C | C | PostgreSQL |
MySQL® |
CR | CR |
- | CR |
CR | CR | C | CR |
CR | - | - | MySQL® |
Oracle |
CR | - | - | CR | CR | - | - | - | - | - | - | Oracle |
MongoDB |
- | - | CR |
- | - | - | C | - | - | - | - | MongoDB |
ClickHouse® |
- | - | - | C |
- | - | - | - | - | - | - | ClickHouse® |
Greenplum® |
C | - | - | C |
C | - | - | - | - | - | - | Greenplum® |
YDB |
- | - | - | CR | - | - | C | CR | CR | - | - | YDB |
Object Storage |
CR | CR | - | CR | CR | CR | - | - | - | - | - | Object Storage |
Metrica |
- | - | - | R | - | - | - | - | - | - | - | Metrica |
Data Streams |
R | R | R | R |
R | R |
R | R |
R | R | R | Yandex Data Streams |
Apache Kafka® |
R | R | R | R | R | R |
R | R |
R | R | R | Apache Kafka® |
Airbyte® |
C | C | C | C | C | C | - | C | C | - | - | Airbyte® |
Elasticsearch |
C | - | - | C | C | C | C | C | C | C | C | Elasticsearch |
OpenSearch |
C | - | - | C | C | C | C | C | C | C | C | OpenSearch |
C: Copy
R: Replicate
CR: Copy and replicate
The remaining transfers are at the Preview stage; you can activate them through a request to our technical support
Airbyte® endpoints
You can use Airbyte®
Airbyte® is already built into Data Transfer, so you do not have to create a separate VM and deploy Airbyte®.
ClickHouse® is a registered trademark of ClickHouse, Inc