Migrating a database from a third-party Apache Kafka® cluster to Yandex Managed Service for Apache Kafka®
There are two ways to migrate topics from an Apache Kafka® source cluster to a Managed Service for Apache Kafka® target cluster:
-
Using the built-in Yandex Managed Service for Apache Kafka® MirrorMaker connector.
This method is easy to configure and does not require creating an intermediate VM.
-
This requires setting up the utility manually on an intermediate virtual machine. Use this method only if it is not possible to migrate data using the built-in MirrorMaker connector for whatever reason.
Both methods are also suitable for migrating a single-host Managed Service for Apache Kafka® cluster to a different availability zone.
Migrating data using Yandex Managed Service for Apache Kafka® Connector
Required paid resources
The support cost for this solution includes:
- Managed Service for Apache Kafka® cluster fee, which covers the use of computing resources allocated to hosts (including ZooKeeper hosts) and disk space (see Apache Kafka® pricing).
- Fee for public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
Create a cluster and a connector
-
Set up the target cluster:
- Create an admin user named
admin-cloud. - Enable the Auto create topics enable property.
- Configure security groups if required for the target cluster connection.
- Create an admin user named
-
In the source cluster, create the
admin-sourceuser authorized to manage topics via the Admin API. -
Make sure the source cluster’s network settings allow cluster connections from the internet.
-
For the target cluster, create a connector of the
MirrorMakertype, configured as follows:-
Topics: List of topics to migrate. You can also specify a regular expression for selecting topics. To migrate all topics, specify
.*. -
Under Source cluster, specify the parameters for connecting to the source cluster:
-
Alias: Source cluster prefix in the connector settings. The default value is
source. Topics in the target cluster will be created with the specified prefix. -
Bootstrap servers: Comma-separated list of the FQDNs of the source cluster broker hosts with the port numbers, such as follows:
FQDN1:9091,FQDN2:9091,...,FQDN:9091 -
SASL username and SASL password: Username and password of the previously created
admin-sourceuser. -
SASL mechanism: Authentication mechanism for username and password validation,
SCRAM-SHA-512. -
Security protocol: Select the connection protocol for the connector:
SASL_PLAINTEXT: For connecting to the source cluster without SSL.SASL_SSL: For SSL connections to the source cluster.
-
-
Under Target cluster, select Use this cluster.
-
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the kafka-mirrormaker-connector.tf
configuration file to the same working directory.This file describes:
- Network.
- Subnet.
- Default security group and inbound internet rules for the cluster.
- Managed Service for Apache Kafka® target cluster with Auto create topics enable set to
true. admin-cloudadmin user for the target cluster.- MirrorMaker connector for the target cluster.
-
In the
kafka-mirrormaker-connector.tffile, specify the following:- Source cluster username and passwords for the source and target cluster users.
- FQDNs of the source cluster broker hosts.
- Source and target cluster aliases.
- Filter pattern for topics to migrate.
- Apache Kafka® version.
-
Validate your Terraform configuration files using this command:
terraform validateTerraform will display any configuration errors detected in your files.
-
Create the required infrastructure:
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Check the target cluster topic for data
- Connect to the target cluster topic using kafkacat. Add the
sourceprefix to the source cluster topic name: for example, themytopictopic is migrated to the target cluster assource.mytopic. - Make sure the console displays messages from the source cluster topic.
Migrating data using MirrorMaker
If you no longer need the resources you created, delete them.
Required paid resources
The support cost for this solution includes:
- Managed Service for Apache Kafka® cluster fee, which covers the use of computing resources allocated to hosts (including ZooKeeper hosts) and disk space (see Apache Kafka® pricing).
- Fee for public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
- VM fee, which covers the use of computing resources, storage, and, optionally, public IP address (see Compute Cloud pricing).
Getting started
Set up your infrastructure
-
Create a Managed Service for Apache Kafka® target cluster:
- With the
admin-cloudadmin user. - With the Auto create topics enable property enabled.
- With the
-
Create a new Linux VM for MirrorMaker in the same network as the target cluster. To be able to connect to the VM not only from within the Yandex Cloud network but also from a local machine, enable public access when creating it.
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the kafka-mirror-maker.tf
configuration file to the same working directory.This file describes:
- Network.
- Subnet.
- Default security group and inbound internet rules for your cluster and VM.
- Managed Service for Apache Kafka® cluster with Auto create topics enable set to
true. admin-cloudApache Kafka® admin user.- Virtual machine with public internet access.
-
In the
kafka-mirror-maker.tffile, specify the following:- Apache Kafka® version.
- Apache Kafka® admin user password.
- Public Ubuntu image ID (non-GPU), e.g., Ubuntu 20.04 LTS.
- Username and path to the public key for VM access. By default, the pre-configured image ignores the specified username and automatically creates a user named
ubuntu. Use it to connect to the VM.
-
Validate your Terraform configuration files using this command:
terraform validateTerraform will display any configuration errors detected in your files.
-
Create the required infrastructure:
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Configure additional settings
-
In the source cluster, create the
admin-sourceuser authorized to manage topics via the Admin API. -
Connect to the VM over SSH.
-
Install the JDK:
sudo apt update && sudo apt install --yes default-jdk -
Download
and unpack the Apache Kafka® archive with the same version as installed on the target cluster, e.g., Apache Kafka® 2.8:wget https://archive.apache.org/dist/kafka/2.8.0/kafka_2.12-2.8.0.tgz && \ tar -xvf kafka_2.12-2.8.0.tgz -
Install kafkacat
:sudo apt update && sudo apt install --yes kafkacatMake sure you can use it to connect to the source and target clusters over SSL.
-
-
Configure a firewall and security groups if required for MirrorMaker connection to the target and source clusters.
Configure MirrorMaker
-
Download an SSL certificate for connecting to the Managed Service for Apache Kafka® cluster.
-
In the home directory, create a folder named
mirror-makerto store Java Keystore certificates and MirrorMaker configuration files:mkdir --parents /home/<home_directory>/mirror-maker -
Choose a password of at least 6 characters for a certificate store, create the store, and add the SSL certificate for cluster connection:
sudo keytool --noprompt -importcert -alias YandexCA \ -file /usr/local/share/ca-certificates/Yandex/YandexInternalRootCA.crt \ -keystore /home/<home_directory>/mirror-maker/keystore \ -storepass <certificate_store_password> -
Create a MirrorMaker configuration file named
mm2.propertiesin themirror-makerfolder:# Kafka clusters clusters=cloud, source source.bootstrap.servers=<source_cluster_broker_FQDN>:9092 cloud.bootstrap.servers=<source_cluster_broker_1_FQDN>:9091, ..., <source_cluster_broker_N_FQDN>:9091 # Source and target cluster settings source->cloud.enabled=true cloud->source.enabled=false source.cluster.alias=source cloud.cluster.alias=cloud # Internal topics settings source.config.storage.replication.factor=<R> source.status.storage.replication.factor=<R> source.offset.storage.replication.factor=<R> source.offsets.topic.replication.factor=<R> source.errors.deadletterqueue.topic.replication.factor=<R> source.offset-syncs.topic.replication.factor=<R> source.heartbeats.topic.replication.factor=<R> source.checkpoints.topic.replication.factor=<R> source.transaction.state.log.replication.factor=<R> cloud.config.storage.replication.factor=<R> cloud.status.storage.replication.factor=<R> cloud.offset.storage.replication.factor=<R> cloud.offsets.topic.replication.factor=<R> cloud.errors.deadletterqueue.topic.replication.factor=<R> cloud.offset-syncs.topic.replication.factor=<R> cloud.heartbeats.topic.replication.factor=<R> cloud.checkpoints.topic.replication.factor=<R> cloud.transaction.state.log.replication.factor=<R> # Topics topics=.* groups=.* topics.blacklist=.*[\-\.]internal, .*\replica, __consumer_offsets groups.blacklist=console-consumer-.*, connect-.*, __.* replication.factor=<M> refresh.topics.enable=true sync.topic.configs.enabled=true refresh.topics.interval.seconds=10 # Tasks tasks.max=<T> # Source cluster authentication parameters. Comment out if no authentication required source.client.id=mm2_consumer_test source.group.id=mm2_consumer_group source.security.protocol=SASL_PLAINTEXT source.sasl.mechanism=SCRAM-SHA-512 source.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin-source" password="<password>"; # Target cluster authentication parameters cloud.client.id=mm2_producer_test cloud.group.id=mm2_producer_group cloud.ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1 cloud.ssl.truststore.location=/home/<home_directory>/mirror-maker/keystore cloud.ssl.truststore.password=<certificate_store_password> cloud.ssl.protocol=TLS cloud.security.protocol=SASL_SSL cloud.sasl.mechanism=SCRAM-SHA-512 cloud.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="admin-cloud" password="<password>"; # Enable heartbeats and checkpoints source->target.emit.heartbeats.enabled=true source->target.emit.checkpoints.enabled=trueMirrorMaker configuration notes:
- It performs one-way replication (
source->cloud.enabled = true,cloud->source.enabled = false). - In the
topicsparameter, list the topics you want to migrate. You can also specify a regular expression for selecting topics. To migrate all topics, specify.*. This configuration replicates all topics. - Topic names in the target cluster cluster match those in the source cluster.
<R>stands for the replication factor for MirrorMaker service topics. Its value should not exceed the lesser of the broker counts in the source and target clusters.<M>stands for the default replication factor defined for topics in the target cluster.<T>stands for the number of concurrent MirrorMaker processes. To distribute replication load evenly, we recommend a value of at least2. For more information, see this Apache Kafka® guide .
You can get the Managed Service for Apache Kafka® broker FQDNs with the list of hosts in the cluster.
- It performs one-way replication (
Start replication
Run MirrorMaker on the VM as follows:
<Apache_Kafka_installation_path>/bin/connect-mirror-maker.sh /home/<home_directory>/mirror-maker/mm2.properties
Check the target cluster topic for data
- Connect to the target cluster topic using kafkacat. Add the
sourceprefix to the source cluster topic name: for example, themytopictopic is migrated to the target cluster assource.mytopic. - Make sure the console displays messages from the source cluster topic.
To learn more about using MirrorMaker 2.0, see this Apache Kafka® article
Delete the resources you created
Delete the resources you no longer need to avoid paying for them:
- Delete the Yandex Managed Service for Apache Kafka® cluster.
- Delete the VM.
- If you reserved public static IP addresses, release and delete them.
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy -
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-