Transferring data to an Apache Kafka® target endpoint
Yandex Data Transfer enables you to migrate data to a Apache Kafka® queue and implement various data processing and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to Apache Kafka®
-
Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
Mirroring data across queues is a separate migration task.
-
Data change capture means tracking changes to a database and delivering those changes to consumers. It is used for applications that are sensitive to real-time data changes.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
Configuring the Apache Kafka® target endpoint
When creating or editing an endpoint, you can define:
- Yandex Managed Service for Apache Kafka® cluster connection or custom installation settings and serialization settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Apache Kafka topic settings.
Managed Service for Apache Kafka® cluster
Warning
To create or edit an endpoint of a managed database, you need to have the managed-kafka.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for Apache Kafka cluster: Select the cluster to connect to.
-
Authentication: Select the connection type (
SASL
orNo authentication
).If you select
SASL
:- Username: Specify the name of the account, under which Data Transfer will connect to the topic.
- Password: Enter the account password.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
Custom installation
Connecting to the database with explicitly specified network addresses.
-
Broker URLs: Specify the IP addresses or FQDNs of the broker hosts.
If the Apache Kafka® port number differs from the standard one, specify it with a colon after the host name:
<broker_host_IP_address_or_FQDN>:<port_number>
-
SSL: Use encryption to protect the connection.
-
PEM Certificate: If encryption of transmitted data is required, for example, to meet the PCI DSS
requirements, upload the certificate file or add its contents as text. -
Endpoint network interface: Select or create a subnet in the desired availability zone.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Authentication: Select the connection type (
SASL
orNo authentication
).If you select
SASL
:- Username: Specify the name of the account, under which Data Transfer will connect to the topic.
- Password: Enter the account password.
- Mechanism: Select the hashing mechanism (SHA 256 or SHA 512).
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
Topic settings Apache Kafka®
-
Topic:
-
Topic full name: Specify the name of the topic to send messages to. Select Save transactions order, so as not to split an event stream into independent queues by table.
-
Topic prefix: Specify the topic prefix, similar to the
Debezium database.server.name
setting. Messages will be sent to a topic named<topic_prefix>.<schema>.<table_name>
.
-
Yandex Data Transfer supports CDC for transfers from PostgreSQL, YDB, and MySQL databases to Apache Kafka® and Yandex Data Streams. Data is sent to the target in Debezium format. For more information about CDC mode, see Change data capture.
Note
In YDB, CDC mode is supported starting from version 22.5.
Serializing settings
-
Serializing settings: Select the serialization type (
Auto
orDebezium
).- Debezium serializer settings: Specify the Debezium serialization parameters.
Additional settings
You can specify topic configuration parameters
Specify the parameter and one of its possible values, e.g., cleanup.policy
and compact
.
After configuring the data source and target, create and start the transfer.