Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Transfer
  • Available transfers
  • Getting started
    • All guides
    • Preparing for a transfer
      • Managing endpoints
      • Migrating endpoints to a different availability zone
        • Source
        • Target
    • Managing transfer process
    • Working with databases during transfer
    • Monitoring transfer status
  • Troubleshooting
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials

In this article:

  • Scenarios for transferring data from ClickHouse®
  • Preparing the source database
  • Configuring the ClickHouse® source endpoint
  • Managed Service for ClickHouse® cluster
  • Custom installation
  • Table filter
  • Configuring the data target
  • Troubleshooting data transfer issues
  • New tables cannot be added
  • Data is not transferred
  • Unsupported date range
  1. Step-by-step guides
  2. Configuring endpoints
  3. ClickHouse
  4. Source

Transferring data from a ClickHouse® source endpoint

Written by
Yandex Cloud
Updated at April 24, 2025
  • Scenarios for transferring data from ClickHouse®
  • Preparing the source database
  • Configuring the ClickHouse® source endpoint
    • Managed Service for ClickHouse® cluster
    • Custom installation
    • Table filter
  • Configuring the data target
  • Troubleshooting data transfer issues
    • New tables cannot be added
    • Data is not transferred
    • Unsupported date range

Yandex Data Transfer enables you to migrate data from a ClickHouse® database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

  1. Explore possible data transfer scenarios.
  2. Prepare the ClickHouse® database for the transfer.
  3. Set up a source endpoint in Yandex Data Transfer.
  4. Set up one of the supported data targets.
  5. Create a transfer and start it.
  6. Perform the required operations with the database and see how the transfer is going.
  7. In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data from ClickHouse®Scenarios for transferring data from ClickHouse®

Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.

  • Migrating a ClickHouse® cluster.
  • Redistributing data across shards.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Preparing the source databasePreparing the source database

Note

Yandex Data Transfer cannot transfer a ClickHouse® database if its name contains a hyphen.

If transferring tables with engines other than ReplicatedMergeTree and Distributed in a ClickHouse® multi-host cluster, the transfer will fail with the following error: the following tables have not Distributed or Replicated engines and are not yet supported.

Managed Service for ClickHouse®
ClickHouse®
  1. Make sure the tables you are transferring use the MergeTree family engines. Only these tables and materialized views (MaterializedView) will be transferred.

    In case of a multi-host cluster, only tables and materialized views with the ReplicatedMergeTree or Distributed engines will be transferred. Make sure these tables and views are present on all the cluster hosts.

  2. Create a user with access to the source database. In the user settings, specify a value of at least 1000000 for the Max execution time parameter.

  1. Make sure the tables you are transferring use the MergeTree family engines. Only these tables and materialized views (MaterializedView) will be transferred.

    In case of a multi-host cluster, only tables and materialized views with the ReplicatedMergeTree or Distributed engines will be transferred. Make sure these tables and views are present on all the cluster hosts.

  2. If not planning to use Cloud Interconnect or VPN for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer.

    For details on linking your network up with external resources, see this concept.

  3. Configure access to the source cluster from Yandex Cloud.

  4. Create a user with access to the source database. In the user settings, specify a value of at least 1000000 for the Max execution time parameter.

Configuring the ClickHouse® source endpointConfiguring the ClickHouse® source endpoint

When creating or updating an endpoint, you can define:

  • Yandex Managed Service for ClickHouse® cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
  • Additional parameters.

Managed Service for ClickHouse® clusterManaged Service for ClickHouse® cluster

Warning

To create or edit an endpoint of a managed database, you need to have the managed-clickhouse.viewer role or the viewer primitive role assigned for the folder where this managed database cluster resides.

Connecting to the database with the cluster ID specified in Yandex Cloud.

Management console
CLI
Terraform
API
  • Managed cluster: From the list, select the name of the cluster you want to connect to.

  • clickhouseClusterName: Specify the shard group to transfer the data from. If this value is not set, data from all shards will be transferred.

  • User: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Database: Specify the name of the database in the selected cluster.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic. This will allow you to apply the specified security group rules to the VMs and clusters in the selected network without changing their settings. For more information, see Networking in Yandex Data Transfer.

    Make sure the selected security groups are configured.

  • Endpoint type: clickhouse-source.
  • --cluster-id: ID of the cluster you need to connect to.

  • --cluster-name: Shard group to transfer the data from. If this parameter is not set, data from all shards will be transferred.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

    Make sure the specified security groups are configured.

  • To set a user password to access the DB, use one of the following parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: clickhouse_source.
  • connection.connection_options.mdb_cluster_id: ID of cluster to connect to.

  • clickhouse_cluster_name: Shard group to transfer the data from. If this parameter is not set, data from all shards will be transferred.

  • subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • security_groups: Security groups for network traffic.

    Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the cluster. For more information, see Networking in Yandex Data Transfer.

    Security groups and the subnet_id subnet, if the latter is specified, must belong to the same network as the cluster.

    Note

    In Terraform, it is not required to specify a network for security groups.

    Make sure the specified security groups are configured.

  • connection.connection_options.database: Database name.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Here is an example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    clickhouse_source {
      clickhouse_cluster_name="<shard_group>"
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          mdb_cluster_id = "<cluster_ID>"
          database       = "<name_of_database_to_migrate>"
          user           = "<username_for_connection>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

    Make sure the specified security groups are configured.

  • mdbClusterId: ID of the cluster you need to connect to.

  • clickhouseClusterName: Shard group to transfer the data from. If this parameter is not set, data from all shards will be transferred.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Custom installationCustom installation

Connecting to the database with explicitly specified network addresses and ports.

Management console
CLI
Terraform
API
  • Shards

    • Shard: Specify a row that will allow the service to distinguish shards from each other. If sharding is disabled in your custom installation, specify any name.
    • Hosts: Specify FQDNs or IP addresses of the hosts in the shard.
  • HTTP port: Set the number of the port that Data Transfer will use for the connection.

    When connecting via the HTTP port:

    • For optional fields, default values are used (if any).
    • Recording complex types is supported (such as array and tuple).
  • Native port: Set the number of the native port that Data Transfer will use for the connection.

  • SSL: Enable if the cluster supports only encrypted connections.

  • CA certificate: If transmitted data has to be be encrypted, e.g., to meet the PCI DSS, upload the certificate file or add its contents as text.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the cluster.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • User: Specify the username that Data Transfer will use to connect to the database.

  • Password: Enter the user's password to the database.

  • Database: Specify the name of the database in the selected cluster.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

  • Endpoint type: clickhouse-source.
  • --cluster-name: Name of the cluster to transfer the data from.

  • --host: List of IP addresses or FQDNs of hosts to connect to, in {shard_name}:{host_IP_address_or_FQDN} format. If sharding is disabled in your custom installation, specify any shard name.

  • http-port: Port number Data Transfer will use for HTTP connections.

  • native-port: Port number Data Transfer will use for connections to the ClickHouse® native interface.

  • --ca-certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • --subnet-id: ID of the subnet the host is in. The transfer will use that subnet to access the host.

  • --database: Database name.

  • --user: Username that Data Transfer will use to connect to the database.

  • --security-group: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • To set a user password to access the DB, use one of the following parameters:

    • --raw-password: Password as text.

    • --password-file: The path to the password file.

  • Endpoint type: clickhouse_source.
  • Shard settings:

    • connection.connection_options.on_premise.shards.name: Shard name that the service will use to distinguish shards from each other. If sharding is disabled in your custom installation, specify any name.
    • connection.connection_options.on_premise.shards.hosts: specify the FQDNs or IP addresses of the hosts in the shard.
  • connection.connection_options.on_premise.http_port: Port number that Data Transfer will use for HTTP connections.

  • connection.connection_options.on_premise.native_port: Port number that Data Transfer will use for connections to the ClickHouse® native interface.

  • connection.connection_options.on_premise.tls_mode.enabled.ca_certificate: CA certificate if the data to transfer must be encrypted, e.g., to comply with the PCI DSS requirements.

    Warning

    If no certificate is added, the transfer may fail with an error.

  • clickhouse_cluster_name: Name of the cluster to transfer the data from.
  • subnet_id: ID of the subnet the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.

    If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.

  • security_groups: Security groups for network traffic.

    Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.

    Security groups must belong to the same network as the subnet_id subnet, if the latter is specified.

    Note

    In Terraform, it is not required to specify a network for security groups.

  • connection.connection_options.database: Database name.

  • connection.connection_options.user: Username that Data Transfer will use to connect to the database.

  • connection.connection_options.password.raw: Password in text form.

Here is an example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    clickhouse_source {
      clickhouse_cluster_name="<cluster_name>"
      security_groups = ["<list_of_security_group_IDs>"]
      subnet_id       = "<subnet_ID>"
      connection {
        connection_options {
          on_premise {
            http_port   = "<HTTP_connection_port>"
            native_port = "<port_for_native_interface_connection>"
            shards {
              name  = "<shard_name>"
              hosts = [ “list of IP addresses and FQDNs of shard hosts" ]
            }
            tls_mode {
              enabled {
                ca_certificate = "<certificate_in_PEM_format>"
              }
            }
          }
          database = "<name_of_database_to_migrate>"
          user     = "<username_for_connection>"
          password {
            raw = "<user_password>"
          }
        }
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

  • onPremise: Database connection parameters:
    • shards: Shard settings:

      • name: Shard name the service will use to distinguish shards one from another. If sharding is disabled in your custom installation, specify any name.
      • hosts: Specify FQDNs or IP addresses of the hosts in the shard.
    • httpPort: Port number Data Transfer will use for HTTP connections.

    • nativePort: Port number Data Transfer will use for connections to the ClickHouse® native interface.

    • tlsMode: Parameters for encrypting the data to transfer, if required, e.g., for compliance with the PCI DSS requirements.

      • disabled: Disabled.
      • enabled: Enabled.
        • caCertificate: CA certificate.

          Warning

          If no certificate is added, the transfer may fail with an error.

    • subnetId: ID of the subnet the host is in. The transfer will use that subnet to access the host.

  • clickhouseClusterName: Name of the cluster to transfer the data from.
  • securityGroups: Security groups for network traffic, whose rules will apply to VMs and clusters without changing their settings. For more information, see Networking in Yandex Data Transfer.

  • database: Database name.

  • user: Username that Data Transfer will use to connect to the database.

  • password.raw: Database user password (in text form).

Table filterTable filter

Management console
CLI
Terraform
API
  • Included tables: Data is only transferred from listed tables.

    When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.

  • Excluded tables: Data from the listed tables is not transferred.

Included and excluded table names must meet the ID naming rules in ClickHouse®. For more information, see the ClickHouse® documentation. Escaping double quotes is not required.

Leave the lists empty to transfer all the tables.

  • --include_table: List of included tables. Only data from the tables listed here will be transferred.

    When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.

  • --exclude_table: List of excluded tables. Data from the listed tables will not be transferred.

If no lists are specified, data from all tables will be transferred.

  • include_tables: List of included tables. Only data from the tables listed here will be transferred.

    When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.

  • exclude_tables: List of excluded tables. Data from the listed tables will not be transferred.

If no lists are specified, data from all tables will be transferred.

  • includeTables: List of included tables. Only data from the tables listed here will be transferred.

    When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.

  • excludeTables: List of excluded tables. Data from the listed tables will not be transferred.

If no lists are specified, data from all tables will be transferred.

Configuring the data targetConfiguring the data target

Configure the target endpoint:

  • ClickHouse®

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

After configuring the data source and target, create and start the transfer.

Troubleshooting data transfer issuesTroubleshooting data transfer issues

  • New tables cannot be added.
  • Data is not transferred.
  • Unsupported date range.

For more troubleshooting tips, see Troubleshooting.

New tables cannot be addedNew tables cannot be added

​No new tables are added to Snapshot and increment transfers.

Solution:

  1. Create tables in the target database manually. For the transfer to work, do the following when creating a table:

    1. Add the transfer service fields to it:

      __data_transfer_commit_time timestamp,
      __data_transfer_delete_time timestamp
      
    2. Use ReplacingMergeTree:

      ENGINE = ReplacingMergeTree
      
  2. Create a separate transfer of the Snapshot and increment type and add only new tables to the list of objects to transfer. Deactivating the original Snapshot and increment transfer is not required. Activate the new transfer, and once it switches to the Replicating status, deactivate it.

    To add other tables, put them into the list of objects to transfer in the created separate transfer (replacing other objects in that list), reactivate it, and, once it switches to the Replicating status, deactivate it.

    Note

    Since two transfers were simultaneously migrating data, you will have duplicate records in the new tables on the target. Run the SELECT * from TABLE <table_name> FINAL query to hide duplicate records or OPTIMIZE TABLE <table_name> to delete them.

Data is not transferredData is not transferred

An attempt to transfer data from a ClickHouse® source fails with this error:


Syntax error: failed at position 25 ('-'): <error_details>. Expected one of: token, Dot, UUID, alias, AS, identifier, FINAL, SAMPLE, INTO OUTFILE, FORMAT, SETTINGS, end of query

Solution:

Yandex Data Transfer cannot transfer a database if its name contains a hyphen. You need to rename your database, if you can.

Unsupported date rangeUnsupported date range

If the migrated data contains dates outside the supported ranges, ClickHouse® returns the following error:

TYPE_ERROR [target]: failed to run (abstract1 source): failed to push items from 0 to 1 in batch:
Push failed: failed to push 1 rows to ClickHouse shard 0:
ClickHouse Push failed: Unable to exec changeItem: clickhouse:
dateTime <field_name> must be between 1900-01-01 00:00:00 and 2262-04-11 23:47:16

Supported date ranges in ClickHouse®:

  • For the DateTime64 type fields: 1900-01-01 to 2299-12-31. For more information, see the ClickHouse® documentation.
  • For the DateTime type fields: 1970-01-01 to 2106-02-07. For more information, see the ClickHouse® documentation.

Solution: use one of the following options:

  • Convert all dates in the source DB to a range supported by ClickHouse®.
  • In the source endpoint parameters, exclude the table with incorrect dates from the transfer.
  • In the transfer parameters, specify the Convert values to string transformer. This will change the field type during the transfer.

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Source
Next
Target
© 2025 Direct Cursus Technology L.L.C.