Parallel copy
Data Transfer can simultaneously use multiple execution threads for a transfer. This significantly increases transfer throughput and allows using more resources for the transfer. Parallel copy is used for all types of copy in Snapshot and Snapshot and increment transfers in the Copying status.
Scaling capabilities depend on the type of source database:
- The PostgreSQL, MongoDB, and Greenplum® sources support table partitioning and parallel copy of data from a single table. For PostgreSQL, the primary key must be of the
serial
type. - The OpenSearch and Elasticsearch sources support parallel copy of data from a single index.
- ClickHouse® sources support parallel partition-based copying. For this, a table must have multiple partitions. A single-partition table will be copied in a single thread. Parallel copying is only available for ClickHouse®-to-ClickHouse® transfers.
- The Yandex Object Storage source supports parallel copy of data from a single folder.
To enable parallel copy, specify its settings. We recommend selecting parallel copy settings individually for each transfer.
Greenplum® parallel copy specifics
The service connects to Greenplum® cluster segments directly and transfers data from the selected table concurrently from all segments. Data consistency in each segment is ensured through snapshot isolation
Settings
Snapshot settings → Parallel snapshot settings:
-
Number of workers: Number of workers to run concurrently to copy data. Each worker is run on a stand-alone VM with dedicated CPU and RAM resources and a dedicated network connection.
-
Number of threads: Number of threads per worker. Each thread is run in a separate container on a worker's VM and copies a single table or its part (depending on the source type).
The extent of transfer parallelism is determined by the number of workers multiplied by the number of threads within a worker.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.