Loading data from Greenplum® to a ClickHouse® data mart
You can migrate a database from Greenplum® to ClickHouse® using Yandex Data Transfer.
To transfer a database from Greenplum® to ClickHouse®:
If you no longer need the resources you created, delete them.
Getting started
For clarity, we will create all required resources in Yandex Cloud. Prepare the infrastructure:
-
Create a Yandex Managed Service for Greenplum® source cluster with any suitable configuration.
-
Create a Yandex Managed Service for ClickHouse® target cluster with any configuration with a database called
db1
. -
If you are using security groups in clusters, make sure they are set up correctly and allow connecting to the clusters:
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the greenplum-clickhouse.tf
configuration file to the same working directory.This file describes:
-
Security groups for connecting to clusters.
-
Managed Service for Greenplum® source cluster
-
Managed Service for ClickHouse® target cluster.
-
In the
greenplum-clickhouse.tf
configuration file, specify the Greenplum® and ClickHouse® administrator passwords. -
Make sure the Terraform configuration files are correct using this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Set up the transfer
-
Create a source endpoint of the
Greenplum®
type, and specify the cluster connection settings in it. -
Create a target endpoint of the
ClickHouse
type. -
Create a transfer of the Snapshot type that will use the created endpoints.
Replication is not available for this endpoint pair, but you can set up regular copying when creating a transfer. To do this, in the Snapshot field under Transfer parameters, select Regular and specify the copy interval. This will activate a transfer automatically after the specified time interval.
Warning
Before configuring regular copying, make sure that the target endpoint parameters include either a
Drop
or aTruncate
cleanup policy. Otherwise, data on the target will be duplicated.
Activate the transfer
-
Connect to the Managed Service for Greenplum® cluster, create a table named
x_tab
, and populate it with data:CREATE TABLE x_tab ( id NUMERIC, name CHARACTER(5) ); CREATE INDEX ON x_tab (id); INSERT INTO x_tab (id, name) VALUES (40, 'User1'), (41, 'User2'), (42, 'User3'), (43, 'User4'), (44, 'User5');
-
Activate the transfer and wait for its status to change to Completed.
-
To verify that the data has transferred correctly, connect to the Managed Service for ClickHouse® target cluster and make sure that the columns of the
x_tab
table in thedb1
database match those of the source databasex_tab
table:SELECT id, name FROM db1.x_tab;
┌─id─┬─name──┐ │ 40 │ User1 │ │ 41 │ User2 │ │ 42 │ User3 │ │ 43 │ User4 │ │ 44 │ User5 │ └────┴───────┘
Check the copy function upon re-activation
-
Connect to the Managed Service for Greenplum® cluster, delete one row in the
x_tab
table and edit the other:DELETE FROM x_tab WHERE id = 41; UPDATE x_tab SET name = 'Key3' WHERE id = 42;
-
Reactivate the transfer and wait for its status to change to Completed.
-
Check the changes in the
x_tab
table on the ClickHouse® target:SELECT id, name FROM db1.x_tab;
┌─id─┬─name──┐ │ 42 │ Key3 │ │ 40 │ User1 │ │ 43 │ User4 │ │ 44 │ User5 │ └────┴───────┘
Delete the resources you created
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
-
Make sure the transfer has the Completed status and delete it.
-
Delete the clusters:
ManuallyTerraformIf you created your resources using Terraform:
-
In the terminal window, go to the directory containing the infrastructure plan.
-
Delete the
greenplum-clickhouse.tf
configuration file. -
Make sure the Terraform configuration files are correct using this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the resources described in the
greenplum-clickhouse.tf
configuration file will be deleted. -
-
ClickHouse® is a registered trademark of ClickHouse, Inc