Transferring data to a Yandex Data Streams target endpoint
Yandex Data Transfer enables you to migrate data to a Yandex Data Streams queue and implement various data processing and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to Yandex Data Streams
-
Data change capture means tracking changes to a database and delivering those changes to consumers. It is used for applications that are sensitive to real-time data changes.
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
-
Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- PostgreSQL.
- MySQL®.
- Apache Kafka®.
- Managed Service for YDB.
- Airbyte®.
- YDS.
- Yandex Object Storage.
- Elasticsearch.
- OpenSearch.
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
Configuring the Yandex Data Streams target endpoint
When creating or updating an endpoint, you can define:
- Stream connection settings in Yandex Data Streams and serialization settings. These are required parameters.
- Additional settings.
Basic settings
-
Database: Select a Yandex Managed Service for YDB database registered in Yandex Data Streams as a target.
-
Stream: Specify the name of the data stream associated with the database.
-
Service account: Select or create a service account with the
yds.editor
role that Data Transfer will use to connect to the data target. -
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
Advanced settings
- Save transaction order: Do not split an event stream into independent queues by table.
Serializing settings
-
Serializing settings: Select the serialization type (
Auto
orDebezium
).- Debezium serializer settings: Specify the Debezium serialization parameters.
After configuring the data source and target, create and start the transfer.
Troubleshooting data transfer issues
See a full list of recommendations in the Troubleshooting section.
Transfer failure
A Replication or Snapshot and increment transfer is interrupted with an error.
Error message:
/Ydb.PersQueue.V1.PersQueueService/AddReadRule failed: OVERLOADED
Transfers are aborted due to the cloud quota
Solution:
- Increase the Number of schema transactions per minute property in the Managed Service for YDB quotas for the cloud with the required database and reactivate the transfer.
Cloud Functions redirects
In rare cases, the following error may occur during transfers from Data Streams or Apache Kafka®:
redirect to SOME_URL is requested but no redirects are allowed.
Possible cause:
The use of the Cloud Functions function is set up on the source. It returns a redirect to another URL rather than data.
Solution:
Such redirects are not allowed for security reasons. Avoid using redirects to Cloud Functions during transfers.