Specifics of working with endpoints
Yandex Data Transfer has some performance limitations and specifics depending on the endpoint types.
ClickHouse®
The Snapshot and Snapshot and increment type transfers (at the copying stage) from ClickHouse® to ClickHouse® do not support VIEW
. In source endpoints of the ClickHouse® type, VIEW
objects must be on the "List of excluded tables" if the "List of included tables" is empty or not specified. If the "List of included tables" is not empty, it must not contain VIEW
objects.
The source supports MATERIALIZED VIEW
objects but handles them as regular tables. This means that in ClickHouse® to ClickHouse® transfers, MATERIALIZED VIEW
items are transferred as tables, not as MATERIALIZED VIEW
objects.
If replication is enabled on a ClickHouse® target, the engines for recreating tables are selected depending on the source type:
- When transferring data from row-oriented database management systems, the ReplicatedReplacingMergeTree
and ReplacingMergeTree engines are used. - When transferring data from ClickHouse®, the ReplicatedMergeTree
engines are used.
Greenplum®
Transfers from Greenplum® to Greenplum® and from Greenplum® to PostgreSQL do not support moving a schema in the current Yandex Data Transfer version. If there are user-defined table data types in these transfers, create these data types in the target database manually before starting a transfer. To manually transfer a schema, use pg_dump
The source treats FOREIGN TABLE
and EXTERNAL TABLE
as regular views and handles them using the general algorithm for VIEW
objects.
Data stored in a MATERIALIZED VIEW
is not transferred. To transfer MATERIALIZED VIEW
data, create an ordinary VIEW
that refers to the MATERIALIZED VIEW
to be transferred.
MongoDB
By default, the service does not shard collections transferred to a sharded cluster. For more information, see Preparing for the transfer.
Transfers to MongoDB do not migrate indexes. When a transfer changes its status to Replicating, manually create an index for each sharded collection:
db.<collection_name>.createIndex(<index_properties>)
For the createIndex()
function description, see the MongoDB documentation
MySQL®
The data types in MySQL® target tables may have the unsigned
property:
unsigned smallint
: Values greater than 2^31 are too large to fit into the target.unsigned bigint
: Values greater than 2^63 are too large to fit into the target.
A primary key in MySQL® cannot be a string of unlimited length.
The MySQL® target ignores the source schema name and creates tables in the schema whose name is specified in the endpoint settings.
PostgreSQL
When transferring data of the TIMESTAMP WITHOUT TIME ZONE
type, the time zone set in the timezone
parameter of the PostgreSQL source database applies.
Example
The column with the TIMESTAMP WITHOUT TIME ZONE
data type contains the 1970-01-01 00:00:00
value. The way the value will be transferred depends on the timezone
parameter in the database:
- If the parameter is set to
Etc/UTC
, the time will be transferred as1970-01-01 00:00:00+00
. - If the parameter is set to
Europe/Moscow
, the time will be transferred as1970-01-01 00:00:00+03
.
Data stored in a MATERIALIZED VIEW
is not transferred. To transfer MATERIALIZED VIEW
data, create an ordinary VIEW
that refers to the MATERIALIZED VIEW
to be transferred.
If the definition of the VIEW
to be transferred contains an invocation of the VOLATILE
functionVIEW
with the READ UNCOMMITTED
isolation level. No consistency between the VIEW
data and the data of other objects being transferred is guaranteed. Reading data from a MATERIALIZED VIEW
in the VIEW
definition are equivalent to invoking the VOLATILE
function.
The source treats FOREIGN TABLE
as a regular view and handles them using the general algorithm for views.
If the source of a transfer from PostgreSQL to PostgreSQL has a non-empty "List of included tables" specified, user-defined data types in these tables are not transferred. If this is the case, transfer your custom data types manually.
When transferring partitioned tables
-
For tables partitioned with declarative partitioning:
- The user needs access to the master table and all its partitions on the source.
- The transfer is performed based on the
as is
principle: all partitions and the master table will be created on the target. - At the copying stage, partitions are transferred to the target independently of each other. To speed up their transfer, configure parallel copy.
- At the replication stage, data will automatically be placed into the required partitions.
- If new partitions are created on the source after the transfer has entered the replication stage, you need to transfer them to the target manually.
- The user can only transfer a part of the partitions to the target. To do this, the user must add these partitions to the List of included tables or close access to unnecessary partitions on the source.
-
For tables partitioned with the inheritance method:
- The user needs access to the parent table and all child tables.
- At the copying stage, data from the child tables is not duplicated in the parent table. To transfer data from the child tables, they must be explicitly specified in the list of tables to be transferred.
- At the copying stage, the child tables are transferred to the target independently of each other. To speed up their transfer, configure parallel copy.
- At the replication stage, data will automatically be placed into the required child tables or the parent table if inheritance is not used for partitioning.
- If the child tables are created on the source after the transfer has entered the replication stage, you need to transfer them to the target manually.
When migrating a database from PostgreSQL to another DBMS, the user can enable the Merge inherited tables option in the source endpoint. In this case:
- Only the parent table will be transferred to the target, and it will contain the data of those child tables which were explicitly specified in the list of tables to be transferred.
- The user can still speed up the transfer because child tables from the source are concurrently copied to the common table on the target. To speed up the transfer, enable parallel copy.
Data transfer rate
Average rate of transferring a table to a single stream for transfers from PostgreSQL to:
- PostgreSQL: 21 Mbps.
- Another DBMS: 3.5 Mbps.
For faster data transfers, use parallel copy.
Number of connections to a database
PostgreSQL has a limit on the number of user connections to the database. If this limit is exceeded for a transfer, the transfer will work incorrectly or fail.
You can calculate the number of connections a transfer will require using the following formulas:
-
For the PostgreSQL resource and the Snapshot transfer type:
<number_of_workers> * <number_of_streams> + 1
Where:
number_of_workers
andnumber_of_threads
: Parameters of the transfer, where the PostgreSQL source is specified.1
: Connection for the master transaction.
-
For the PostgreSQL resource and the Snapshot and increment transfer type:
<number_of_workers> * <number_of_streams> + 2
Where:
number_of_workers
andnumber_of_threads
: Parameters of the transfer, where the PostgreSQL source is specified.2
: Connections for the master transaction and slot monitor.
-
For the PostgreSQL target:
<number_of_workers> * <number_of_streams>
Where
number_of_workers
andnumber_of_threads
are the parameters of the transfer, where the PostgreSQL target is specified.
If the calculated amount exceeds the limit, do one of the following:
- Reduce the number of workers or threads in the transfer.
- Increase the maximum allowed number of connections for a user in PostgreSQL.
Yandex Data Streams
By default, a separate table is created for every partition when data is transferred from Data Streams to ClickHouse®. For all data to be entered in a single table, specify conversion rules in the advanced endpoint settings for the source.
Object Storage
Transfers from the Object Storage data source operate in APPEND-ONLY mode. Deleting a file from Object Storage will not affect the data in the target. When updating a file: if the target contains synthetic keys, then the file strings will be updated in the target; otherwise, they will be appended. By default, the primary key contains two columns: file name and string number in the file.
Oracle
The source ignores VIEW
and MATERIALIZED VIEW
objects in transfers of any type.
ClickHouse® is a registered trademark of ClickHouse, Inc