Migrating data to Managed Service for ClickHouse®
To migrate your database to Managed Service for ClickHouse®, you need to directly transfer your data, lock the old database for write access, and transfer the load on the database cluster to Yandex Cloud.
To transfer data to a Managed Service for ClickHouse® cluster, you can use Apache ZooKeeper
Transfer your data to an intermediate VM in Compute Cloud if:
- The Managed Service for ClickHouse® cluster is not accessible from the internet.
- The network equipment or connection to the ClickHouse® cluster in Yandex Cloud is not reliable enough.
- There is no environment to run
clickhouse-copier
.
Migration stages:
- Prepare for migration.
- Install ZooKeeper.
- Create a cluster.
- Create a task for
clickhouse-copier
. - Add a task for
clickhouse-copier
to ZooKeeper. - Launch
clickhouse-copier
.
If you no longer need the resources you created, delete them.
Prepare for migration
-
Compatible software versions
- ClickHouse® versions must be the same in both clusters.
- The
clickhouse-copier
version must be the same as the ClickHouse® version in the cluster
Managed Service for ClickHouse® or higher. - The ZooKeeper version must be 3.5 or higher.
-
Check that the source cluster is ready to migrate:
- SSL is enabled for encrypting traffic.
- The load on the database or shard the data is copied from does not cause
any issues. Clickhouse-copier
has access to the database, and the account being used has
read-only access.
-
If you are using a virtual machine in Yandex Cloud for migration:
- Create your VMs in the same cloud network as the Managed Service for ClickHouse® cluster.
- Choose the computing capacity of the VM based on the amount of the data being transferred.
Install ZooKeeper
To migrate data, just start a single ZooKeeper node.
-
Install Java Runtime Environment:
sudo apt-get install default-jre
-
Add the user you want to run ZooKeeper under:
sudo adduser hadoop
-
Create a directory for ZooKeeper data:
sudo mkdir -p /var/data/zookeeper && \ sudo chown -R hadoop:hadoop /var/data
-
Install ZooKeeper (single-node setup):
-
Download the latest stable version of the distribution. To learn more, see the page with releases
.cd /opt && \ sudo wget https://downloads.apache.org/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz && \ sudo mkdir zookeeper && \ sudo tar -C /opt/zookeeper -xvf apache-zookeeper-3.6.2-bin.tar.gz --strip-components 1 && \ sudo chown hadoop:hadoop -R zookeeper
-
Switch to the user you previously created to run ZooKeeper:
su hadoop
-
Create a file named
zoo.cfg
:nano /opt/zookeeper/conf/zoo.cfg
With the following content:
tickTime=2000 dataDir=/var/data/zookeeper clientPort=2181
-
The master node must have a unique ID. To configure it, create the
myid
file.nano /var/data/zookeeper/myid
Specify a unique ID as the content (for example, "1").
-
-
To run ZooKeeper for debugging:
bash /opt/zookeeper/bin/zkServer.sh start-foreground
-
To start ZooKeeper in normal mode:
bash /opt/zookeeper/bin/zkServer.sh start
Create a Managed Service for ClickHouse® cluster
Make sure that the computing capacity and storage size of the cluster are appropriate for the environment
where the existing databases are deployed and create a cluster.
Create a task for clickhouse-copier
To run clickhouse-copier
using ZooKeeper, you need to prepare:
- ZooKeeper configuration file (
config.xml
). - File describing the task (
cp-task.xml
).
You can find the clickhouse-copier
guide in the ClickHouse® documentation
Prepare a configuration file for ZooKeeper
The configuration file (config.xml
) has to specify:
- In the
<zookeeper>
element : Address of the host where you installed ZooKeeper. - In the
<caConfig>
element: Path to the certificate for connecting to Managed Service for ClickHouse®.
You can download a certificate at https://storage.yandexcloud.net/cloud-certs/RootCA.pem
Sample configuration:
<yandex>
<zookeeper>
<node>
<host>127.0.0.1</host>
<port>2181</port>
</node>
</zookeeper>
<logger>
<level>trace</level>
<log>log.log</log>
<errorlog>log.err.log</errorlog>
<size>never</size>
</logger>
<openSSL>
<client>
<loadDefaultCAFile>true</loadDefaultCAFile>
<caConfig>RootCA.crt</caConfig>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>
</yandex>
Describe the task
Example of data migration task description (cp-task.xml
):
<yandex>
<tcp_port_secure>9440</tcp_port_secure> <!-- optional -->
<remote_servers>
<source_cluster>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>your-clickhouse-server.com</host>
<port>9440</port>
<user>your-user</user>
<password>password</password>
<secure>1</secure>
</replica>
</shard>
</source_cluster>
<destination_cluster>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>your-cloud-clickhouse.mdb.cloud.yandex.net</host>
<port>9440</port>
<user>your-user</user>
<password>password</password>
<secure>1</secure>
</replica>
</shard>
</destination_cluster>
</remote_servers>
<max_workers>2</max_workers>
<settings_pull>
<readonly>1</readonly>
</settings_pull>
<settings_push>
<readonly>0</readonly>
</settings_push>
<settings>
<connect_timeout>3</connect_timeout>
<insert_distributed_sync>1</insert_distributed_sync>
</settings>
<tables>
<table_hits>
<cluster_pull>source_cluster</cluster_pull>
<database_pull>your-db</database_pull>
<table_pull>your-table</table_pull>
<cluster_push>destination_cluster</cluster_push>
<database_push>your-db</database_push>
<table_push>your-table</table_push>
<engine>
<!-- Description of the table engine, the same as in the CREATE TABLE procedure
in the source cluster. For an existing table, you can
get a description with the query SHOW CREATE table_name -->
ENGINE = MergeTree() PARTITION BY Year ORDER BY (Year, FlightDate) SETTINGS index_granularity=8192
</engine>
<sharding_key>jumpConsistentHash(intHash64(Year), 2)</sharding_key>
<enabled_partitions>
<partition>'2017'</partition>
</enabled_partitions>
</table_hits>
</tables>
</yandex>
Add a task for clickhouse-copier to ZooKeeper
To add a task to ZooKeeper, run the following commands:
/opt/zookeeper/bin/zkCli.sh -server localhost:2181 deleteall /cp-task.xml/description && \
/opt/zookeeper/bin/zkCli.sh -server localhost:2181 deleteall /cp-task.xml/task_active_workers && \
/opt/zookeeper/bin/zkCli.sh -server localhost:2181 deleteall /cp-task.xml && \
fc=$(cat ./cp-task.xml) && \
/opt/zookeeper/bin/zkCli.sh -server localhost:2181 create /cp-task.xml "" && \
/opt/zookeeper/bin/zkCli.sh -server localhost:2181 create /cp-task.xml/description "$fc"
Launch clickhouse-copier
Warning
If you didn't create the folders that you specified in the -- base-dir
flag or for saving logs, clickhouse-copier
might not start.
You can start the copier using the following command (to run in daemon mode
, add the --daemon
flag):
clickhouse-copier \
--config ./config.xml \
--task-path ./cp-task.xml \
--base-dir ./clickhouse \
--log-level debug
Once the operation is complete, check the logs to make sure that everything was
copied successfully.
Delete the resources you created
If you no longer need the cluster you created, delete it.
ClickHouse® is a registered trademark of ClickHouse, Inc