Loading data from MySQL® to a ClickHouse® data mart
Data Transfer enables you to migrate your database from a MySQL® source cluster to ClickHouse®.
To transfer data:
- Prepare the source cluster.
- Set up and activate the transfer.
- Test your transfer.
- Query data in ClickHouse®.
If you no longer need the resources you created, delete them.
Required paid resources
- Managed Service for MySQL® cluster: computing resources allocated to hosts, size of storage and backups (see Managed Service for MySQL® pricing).
- Managed Service for ClickHouse® cluster: use of computing resources allocated to hosts, size of storage and backups (see Managed Service for ClickHouse® pricing).
- Public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
- Each transfer: use of computing resources and number of transferred data rows (see Data Transfer pricing).
Getting started
Set up the infrastructure:
-
Create a Managed Service for MySQL® source cluster with your preferred configuration. Enable public access to the cluster during creation so you can connect to it from your local machine. Connections from within the Yandex Cloud network are enabled by default.
-
Create a Managed Service for ClickHouse® target cluster with the following settings:
- Number of ClickHouse® hosts: Minimum of 2 to enable replication within the cluster.
- Database name: Must be identical to the database name in the source cluster.
- Enable public access to the cluster during creation so you can connect to it from your local machine. Connections from within the Yandex Cloud network are enabled by default.
-
If using security groups, configure them to allow internet access to your clusters:
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the data-transfer-mmy-mch.tf
configuration file to your current working directory.This file describes:
- Network.
- Subnets.
- Security group and the rule permitting access to the Managed Service for MySQL® cluster.
- Managed Service for MySQL® source cluster.
- Managed Service for ClickHouse® target cluster.
- Source endpoint.
- Target endpoint.
- Transfer.
-
In the
data-transfer-mmy-mch.tffile, specify the following:-
Source endpoint parameters inherited from the Managed Service for MySQL® source cluster :
source_mysql_version: MySQL® version.source_db_name: MySQL® database name.source_userandsource_password: Database owner username and password.
-
Target endpoint parameters inherited from the Managed Service for ClickHouse® target cluster:
target_db_name: ClickHouse® database name.target_userandtarget_password: Database owner username and password.
-
-
Make sure the Terraform configuration files are correct using this command:
terraform validateTerraform will show any errors found in your configuration files.
-
Create the required infrastructure:
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Prepare the source cluster
-
If you created the infrastructure manually, you must now prepare your source cluster.
-
Add test data to the database.
- Create a table named
x_tab:
CREATE TABLE x_tab ( id INT, name TEXT, PRIMARY KEY (id) );- Populate the table with data:
INSERT INTO x_tab (id, name) VALUES (40, 'User1'), (41, 'User2'), (42, 'User3'), (43, 'User4'), (44, 'User5'); - Create a table named
Set up and activate the transfer
-
-
Database type:
MySQL® -
Endpoint parameters → Connection settings:
Managed Service for MySQL clusterSelect your source cluster from the list and specify its connection settings.
-
-
-
Database type:
ClickHouse -
Endpoint parameters → Connection settings:
Managed clusterSelect your target cluster from the list and specify its connection settings.
-
-
Create a transfer of the Snapshot and replication type that will use the new endpoints.
-
Activate the transfer.
-
In the
data-transfer-mmy-mch.tffile, set thetransfer_enabledvariable to1. -
Make sure the Terraform configuration files are correct using this command:
terraform validateTerraform will show any errors found in your configuration files.
-
Create the required infrastructure:
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
The transfer will activate automatically upon creation.
-
Test the transfer
-
Wait for the transfer status to change to Replicating.
-
Verify that the data has been transferred from the source Managed Service for MySQL® cluster to the Managed Service for ClickHouse® database:
-
Connect to the cluster via
clickhouse-client. -
Run this query:
SELECT * FROM <ClickHouse®_database_name>.x_tabResult:
┌─id─┬─name──┬─__data_transfer_commit_time─┬─__data_transfer_delete_time─┐ │ 40 │ User1 │ 1661952756538347180 │ 0 │ │ 41 │ User2 │ 1661952756538347180 │ 0 │ │ 42 │ User3 │ 1661952756538347180 │ 0 │ │ 43 │ User4 │ 1661952756538347180 │ 0 │ │ 44 │ User5 │ 1661952756538347180 │ 0 │ └────┴───────┴─────────────────────────────┴─────────────────────────────┘The table also contains the following timestamp columns:
__data_transfer_commit_timeand__data_transfer_delete_time.
-
-
In the source MySQL® table
x_tab, delete the row withid=41and update the row withid=42:-
Run the following queries:
DELETE FROM x_tab WHERE id = 41; UPDATE x_tab SET name = 'Key3' WHERE id = 42;
-
Make sure the changes have been applied to the
x_tabtable on the ClickHouse® target:SELECT * FROM <ClickHouse®_database_name>.x_tab WHERE id in (41,42);Result:
┌─id─┬─name──┬─__data_transfer_commit_time─┬─__data_transfer_delete_time─┐ │ 41 │ User2 │ 1661952756538347180 │ 0 │ │ 42 │ User3 │ 1661952756538347180 │ 0 │ └────┴───────┴─────────────────────────────┴─────────────────────────────┘ ┌─id─┬─name─┬─__data_transfer_commit_time─┬─__data_transfer_delete_time─┐ │ 41 │ ᴺᵁᴸᴸ │ 1661953256000000000 │ 1661953256000000000 │ └────┴──────┴─────────────────────────────┴─────────────────────────────┘ ┌─id─┬─name─┬─__data_transfer_commit_time─┬─__data_transfer_delete_time─┐ │ 42 │ Key3 │ 1661953280000000000 │ 0 │ └────┴──────┴─────────────────────────────┴─────────────────────────────┘
Query data in ClickHouse®
For table recovery, ClickHouse® targets with replication use the ReplicatedReplacingMergeTree
-
__data_transfer_commit_time: Time inTIMESTAMPformat when this row was last updated. -
__data_transfer_delete_time: Time inTIMESTAMPformat when this row was deleted from the source table. A value of0indicates that the row is still active.The
__data_transfer_commit_timecolumn is essential for the ReplicatedReplacedMergeTree engine. It tracks changes by inserting a new version of a row upon any update or deletion, timestamped with the operation's commit time. Consequently, a query by a primary key may return multiple row versions with different__data_transfer_commit_timevalues.
The source data can be added or deleted while the transfer is in the Replicating status. To ensure an SQL query by a primary key returns a single record, always filter on __data_transfer_delete_time when querying tables transferred to ClickHouse®. For example, to query the x_tab table, use the following syntax:
SELECT * FROM <ClickHouse®_database_name>.x_tab FINAL
WHERE __data_transfer_delete_time = 0;
To simplify the SELECT queries, create a view filtering rows by __data_transfer_delete_time. Use this view for all your queries. For example, to query the x_tab table, use the following syntax:
CREATE VIEW x_tab_view AS SELECT * FROM <ClickHouse®_database_name>.x_tab FINAL
WHERE __data_transfer_delete_time == 0;
Delete the resources you created
Note
Before deleting the created resources, deactivate the transfer.
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy -
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-
ClickHouse® is a registered trademark of ClickHouse, Inc