Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Transfer
  • Available transfers
  • Getting started
    • All guides
    • Preparing for a transfer
      • Managing endpoints
      • Migrating endpoints to a different availability zone
        • Source
        • Target
    • Managing transfer process
    • Working with databases during transfer
    • Monitoring transfer status
  • Troubleshooting
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials

In this article:

  • Scenarios for transferring data from Greenplum®
  • Preparing the Greenplum® database
  • Configuring the Greenplum® source endpoint
  • Yandex MPP Analytics for PostgreSQL cluster
  • Custom installation
  • Table filter
  • Advanced settings
  • Specifics of working with the Greenplum source
  • Snapshot consistency
  • Configuring the data target
  1. Step-by-step guides
  2. Configuring endpoints
  3. Greenplum®
  4. Source

Transferring data from a Greenplum® source endpoint

Written by
Yandex Cloud
Updated at November 1, 2025
  • Scenarios for transferring data from Greenplum®
  • Preparing the Greenplum® database
  • Configuring the Greenplum® source endpoint
    • Yandex MPP Analytics for PostgreSQL cluster
    • Custom installation
    • Table filter
    • Advanced settings
    • Specifics of working with the Greenplum source
    • Snapshot consistency
  • Configuring the data target

Yandex Data Transfer enables you to migrate data from a Greenplum® database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

  1. Explore possible data transfer scenarios.
  2. Prepare the Greenplum® database for the transfer.
  3. Set up a source endpoint in Yandex Data Transfer.
  4. Set up one of the supported data targets.
  5. Create a transfer and start it.
  6. Perform required operations with the database and control the transfer.
  7. In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data from Greenplum®Scenarios for transferring data from Greenplum®

  1. Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.

    • Migrating a Greenplum® cluster.
  2. Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.

    • Loading data from Greenplum® to the ClickHouse® data mart.
    • Loading data from Greenplum® to the PostgreSQL data mart.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Preparing the Greenplum® databasePreparing the Greenplum® database

Note

Data stored in a MATERIALIZED VIEW is not transferred. To transfer MATERIALIZED VIEW data, create an ordinary VIEW that refers to the MATERIALIZED VIEW to be transferred.

Yandex MPP Analytics for PostgreSQL
Greenplum®
  1. Create a user account the transfer will use to connect to the source. To do this, run the following command:

    CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
    
  2. Configure the source cluster to enable the user you created to connect to all the cluster master hosts.

  3. If you are going to use parallel copy, configure the source cluster to enable the user you created to connect to all the cluster's segment hosts in utility mode. To do this, make sure that the "Access from Data Transfer" setting is enabled for the cluster.

  4. Grant the user you created the SELECT privilege for the tables to transfer and the USAGE privilege for the schemas these tables are in.

    Privileges must be granted to entire tables. Access to certain table columns only is not supported.

    Tables without the required privileges are unavailable to Data Transfer. These tables are processed as if they did not exist.

    This example issues privileges to all the tables in the selected schema:

    GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <username>;
    GRANT USAGE ON SCHEMA <schema_name> TO <username>;
    
  1. If not planning to use Cloud Interconnect or VPN for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer.

    For details on linking your network up with external resources, see this concept.

  2. Create a user account the transfer will use to connect to the source. To do this, run the following command:

    CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
    
  3. Configure the source cluster to enable the user you created to connect to all the cluster master hosts.

  4. If you are going to use parallel copy, configure the source cluster to enable the user you created to connect to all the cluster's segment hosts in utility mode.

  5. Grant the user you created the SELECT privilege for the tables to transfer and the USAGE privilege for the schemas these tables are in.

    Privileges must be granted to entire tables. Access to certain table columns only is not supported.

    Tables without the required privileges are unavailable to Data Transfer. These tables are processed as if they did not exist.

    This example grants privileges to all the database tables:

    GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <username>;
    GRANT USAGE ON SCHEMA <schema_name> TO <username>;
    

Data Transfer works with Greenplum® differently depending on the transfer configuration and the source cluster contents. Detailed information is available in the section on Greenplum® source endpoint settings.

Configuring the Greenplum® source endpointConfiguring the Greenplum® source endpoint

When creating or updating an endpoint, you can define:

  • Yandex MPP Analytics for PostgreSQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
  • Additional parameters.

Yandex MPP Analytics for PostgreSQL clusterYandex MPP Analytics for PostgreSQL cluster

Warning

To create or edit an endpoint of a managed database, you will need the managed-greenplum.viewer role or the primitive viewer role for the folder the cluster of this managed database resides in.

Connection to the database with the cluster specified in Yandex Cloud.

Management console
  • Connection type: Select a database connection option:

    • Self-managed: Allows you to specify connection settings manually.

      Select Managed Service for Greenplum cluster as the installation type and configure these settings:

      • Managed Service for Greenplum cluster: Select the cluster to connect to.
      • Database: Specify the name of the database in the selected cluster.
      • User: Specify the username Data Transfer will use to connect to the database.
      • Password: Enter the user password for access to the database.
    • Connection Manager: Allows connecting to the cluster via Yandex Connection Manager:

      • Select the folder with the Yandex MPP Analytics for PostgreSQL cluster.

      • Select Managed DB cluster as the installation type and configure these settings:

        • Cluster for Managed DB: Select the cluster to connect to.
        • Connection: Select or create a connection in Connection Manager.
        • Database: Specify the name of the database in the selected cluster.

      Warning

      To use a connection from Connection Manager, the user must have access permissions for this connection of connection-manager.user or higher.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Custom installationCustom installation

Connection to the database with explicitly specified network addresses and ports.

Management console
  • Connection type: Select a database connection option:

    • Self-managed: Allows you to specify connection settings manually:

      • Coordinator host: Specify the IP or FQDN of the primary master host to connect to.

      • Coordinator port: Specify the port for Data Transfer to use to connect to the primary master host.

      • Coordinator mirror host: Specify the IP address or FQDN of the standby master host to connect to (leave the field empty if your cluster only has one master host).

      • Coordinator mirror port: Specify the port for Data Transfer to use to connect to the standby master host (leave the field empty if there is only one master host in your cluster).

      • Greenplum cluster segments: Specify segment host connection information. If you omit these, segment host addresses will be retrieved automatically from the master host housekeeping table.

      • CA certificate: Upload the certificate file or add its contents as text if transmitted data has to be be encrypted, e.g., to meet the PCI DSS requirements.

        Warning

        If no certificate is added, the transfer may fail with an error.

      • Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the database.

        If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.

      • Database: Enter the database name.

      • User: Specify the username Data Transfer will use to connect to the database.

      • Password: Enter the user password for access to the database.

    • Connection Manager: Allows connecting to the database using Yandex Connection Manager:

      • Select the folder the Connection Manager connection was created in.

      • Select Custom installation as the installation type and configure these settings:

        • Connection: Select or create a connection in Connection Manager.

        • Database: Specify the database name in the custom installation.

        • Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the database.

          If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.

        Warning

        To use a connection from Connection Manager, the user must have access permissions for this connection of connection-manager.user or higher.

  • Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

    This will allow you to apply the specified security group rules to VMs and DBs in the selected network without reconfiguring these VMs and DBs. For more information, see Networking in Yandex Data Transfer.

Table filterTable filter

Management console
  • Included tables: Only data from the tables listed here will be transferred.

    If a table is partitioned, you can use this field to specify both the entire table and individual partitions.

    Make sure that, for tables to be included in the list, all the necessary privileges are granted to the user to perform the data transfer.

    When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.

  • Excluded tables: Data from the listed tables is not transferred.

    If a table is partitioned, to exclude it from the list, make sure to list all of its partitions.

    The lists include the name of the schema that describes the DB contents, structure, and integrity constraints, as well as the table name. Both lists support expressions in the following format:

    • <schema_name>.<table_name>: Full table name.
    • <schema_name>.*: All tables in the specified schema.
    • <table_name>: Table in the default schema.

    Included and excluded table names must meet the ID naming rules in Greenplum®. Double quotes within a table name are not supported. Outer quotes are only used as delimiters and will be deleted when processing paths.

Advanced settingsAdvanced settings

  • Management console

    • Snapshot consistency: When enabled, Data Transfer will apply additional steps to the source to assure snapshot consistency.

    • Service object schema: Schema for placing auxiliary objects of the transfer.

      The schema name must meet the ID naming rules in Greenplum®. Double quotes are not supported in schema names.

    • Use direct reading from segments: Disables gpfdist for transfers from Greenplum® to Greenplum®.

Specifics of working with the Greenplum sourceSpecifics of working with the Greenplum source

Data Transfer supports Greenplum® version 6 only. Greenplum® versions 4 and 5 are not supported.

The service performs operations with a Greenplum® cluster in transactions with the READ COMMITTED level of isolation.

Data Transfer supports operation with parallel copy enabled for a Greenplum® source.

During operation with parallel copy enabled, Data Transfer maintains an open transaction on the Greenplum® master host. If this transaction is interrupted, a transfer will return an error.

With parallel copy disabled, a transfer will move data from these Greenplum® objects: TABLE, VIEW, FOREIGN TABLE, and EXTERNAL TABLE. Data from these objects will be treated as data from ordinary tables and processed by the target accordingly. With parallel copy enabled, a transfer will only move tables (TABLE objects). However, tables with the DISTRIBUTED REPLICATED allocation policy will not be transferred.

If a Greenplum®-to-Greenplum® transfer does not use direct reads from segments, the number of threads cannot exceed the minimum number of segments in the participating clusters.

You can check the number of segments in the management console or by running an SQL query:

SELECT COUNT(*) FROM gp_segment_configuration WHERE role='p' AND content >= 0;

The number of workers participating in the transfer is limited to a specified number of threads. Each worker transfers tables one by one, with each table only transferred by one worker.

Snapshot consistencySnapshot consistency

When you start a transfer with parallel copy off (default), the service interacts only with the Greenplum® cluster's master host when copying data. The tables being copied are accessed in ACCESS SHARE lock mode. Snapshot consistency is achieved through Greenplum® mechanisms.

When you start a transfer with parallel copy on, the service interacts with both the master host and the Greenplum® cluster's segment hosts in utility mode. The tables being copied are accessed in ACCESS SHARE or SHARE lock mode, depending on the snapshot consistency setting.

To guarantee snapshot consistency, a transfer with parallel copy on has to ensure the data in the tables remains static. With ACCESS SHARE (default), the service is unable to guarantee that the data will remain static, so this must be ensured externally. With SHARE, the Greenplum® mechanisms are there to guarantee that data in the source tables remains static.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.

Configuring the data targetConfiguring the data target

Configure one of the supported data targets:

  • PostgreSQL
  • ClickHouse®
  • Greenplum®
  • YTsaurus

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

After configuring the data source and target, create and start the transfer.

Was the article helpful?

Previous
Target
Next
Target
© 2025 Direct Cursus Technology L.L.C.