Transferring data to a Greenplum® target endpoint
Yandex Data Transfer enables you to migrate data to a Greenplum® database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Prepare the Greenplum® database for the transfer.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to Greenplum®
-
Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
-
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- PostgreSQL.
- MySQL.
- Greenplum®.
- Apache Kafka®.
- Airbyte®.
- YDS.
- Yandex Object Storage.
- Managed Service for YDB.
- Oracle.
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
Preparing the target database
-
Disable the following settings on the target:
- Integrity checks for foreign keys
- Triggers
- Other constraints
Warning
Do not reactivate these settings before the transfer is complete. This will ensure data integrity with respect to foreign keys.
-
Create a user:
CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
-
Grant the user all privileges for the database, schemas, and tables to be transferred:
GRANT ALL PRIVILEGES ON DATABASE <database_name> TO <username>;
If the database is not empty, the user must be its owner:
ALTER DATABASE <database_name> OWNER TO <username>;
Once started, the transfer will connect to the target on behalf of this user.
-
Make sure the settings for the network hosting the cluster allow public connections from IP addresses used by Data Transfer
. -
Disable the following settings on the target:
- Integrity checks for foreign keys
- Triggers
- Other constraints
Warning
Do not reactivate these settings before the transfer is complete. This will ensure data integrity with respect to foreign keys.
-
Create a user:
CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
-
Grant the user all privileges for the database, schemas, and tables to be transferred:
GRANT ALL PRIVILEGES ON DATABASE <database_name> TO <username>;
If the database is not empty, the user must be its owner:
ALTER DATABASE <database_name> OWNER TO <username>;
Once started, the transfer will connect to the target on behalf of this user.
Configuring the Greenplum® target endpoint
When creating or editing an endpoint, you can define:
- Yandex Managed Service for Greenplum® cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Managed Service for Greenplum® cluster
Warning
To create or edit an endpoint of a managed database, you need to have the managed-greenplum.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for Greenplum cluster: Specify ID of the cluster to connect to.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user password to the database.
-
Database: Specify the name of the database in the selected cluster.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
Custom installation
Connecting to the database with explicitly specified network addresses and ports.
-
Coordinator host: Specify the IP or FQDN of the primary master host to connect to.
-
Coordinator port: Specify the port for Data Transfer to use to connect to the primary master host.
-
Coordinator mirror host: Specify the IP address or FQDN of the standby master host to connect to (leave the field empty if your cluster only has one master host).
-
Coordinator mirror port: Specify the port for Data Transfer to use to connect to the standby master host (leave the field empty if there is only one master host in your cluster).
-
Greenplum cluster segments: Specify segment host connection information. If you omit these, segment host addresses will be retrieved automatically from the master host housekeeping table.
-
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS
requirements. -
Subnet ID: Select or create a subnet in the desired availability zone.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Database: Specify the name of the database in the selected cluster.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
- Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
Additional settings
Cleanup policy: Select a way to clean up data in the target database before the transfer:
-
Don't cleanup
: Select this option only for replication without data copying. -
Drop
: Completely delete the tables included in the transfer (default).Use this option to always transfer the latest version of the table schema to the target database from the source whenever the transfer is activated.
-
Truncate
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
After configuring the data source and target, create and start the transfer.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.