Transferring data to a PostgreSQL target endpoint
- Scenarios for transferring data to PostgreSQL
- Configuring the data source
- Preparing the target database
- Configuring the PostgreSQL target endpoint
- Operations with the database during transfer
- Troubleshooting data transfer issues
- Stopping a transfer's master transaction session
- Exceeding the connection time-to-live quota
- VIEW transfer error
- Error when adding a table entry by constraint
- Error when transferring all schema tables
- Low transfer speed
- Unable to transfer child tables
- Insufficient replication slots in a source database
- No data transfer after changing a source endpoint
- Transfer error when changing a master host
- Error when transferring nested transactions
- Error transferring tables with deferred constraints
- Could not create a replication slot at the activation step
- Excessive WAL size increase
- Error when replicating data from an external source
- Error when transfering tables without primary keys
- Error when dropping a table under the Drop cleanup policy
Yandex Data Transfer enables you to migrate data to a PostgreSQL database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Prepare the PostgreSQL database for the transfer.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to PostgreSQL
-
Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
-
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- PostgreSQL
- MySQL
- Greenplum®
- Apache Kafka®
- Airbyte®
- YDS
- Yandex Object Storage
- Managed Service for YDB
- Oracle.
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
Preparing the target database
-
Make sure that the PostgreSQL major version on the target is not lower than that on the source.
-
In the target database, enable the same extensions that are enabled in the source database.
-
Make sure the target has the
DROP transfer tables
cleanup policy selected. -
Create a user with access to the target database.
-
Grant the user all privileges for the database, schemas, and tables to be transferred:
GRANT ALL PRIVILEGES ON DATABASE <database_name> TO <username>;
If the database is not empty, the user must be its owner:
ALTER DATABASE <database_name> OWNER TO <username>;
Once started, the transfer will connect to the target on behalf of this user.
-
If the target has the Save transaction boundaries option enabled, grant the created user all privileges to create the
__data_transfer_lsn
housekeeping table in the current schema (usually it ispublic
) on the target:GRANT ALL PRIVILEGES ON SCHEMA <schema_name> TO <username>;
-
Configure the number of user connections to the database.
-
Make sure the settings for the network hosting the cluster allow public connections from IP addresses used by Data Transfer
. -
Make sure that the PostgreSQL major version on the target is not lower than that on the source.
-
In the target database, enable the same extensions that are enabled in the source database.
-
Make sure the target has the
DROP transfer tables
cleanup policy selected. -
Create a user:
CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
-
Grant the user all privileges for the database, schemas, and tables to be transferred:
GRANT ALL PRIVILEGES ON DATABASE <database_name> TO <username>;
If the database is not empty, the user must be its owner:
ALTER DATABASE <database_name> OWNER TO <username>;
Once started, the transfer will connect to the target on behalf of this user.
-
If the target has the Save transaction boundaries option enabled, grant the created user all privileges to create the
__data_transfer_lsn
housekeeping table in the current schema (usually it ispublic
) on the target:GRANT ALL PRIVILEGES ON SCHEMA <schema_name> TO <username>;
-
Configure the number of user connections to the database.
Data stored in a MATERIALIZED VIEW
is not transferred. To transfer MATERIALIZED VIEW
data, create an ordinary VIEW
that refers to the MATERIALIZED VIEW
to be transferred.
If the definition of the VIEW
to be transferred contains an invocation of the VOLATILE
functionVIEW
with the READ UNCOMMITTED
isolation level. No consistency between the VIEW
data and the data of other objects being transferred is guaranteed. Reading data from a MATERIALIZED VIEW
in the VIEW
definition are equivalent to invoking the VOLATILE
function.
Configuring the PostgreSQL target endpoint
When creating or editing an endpoint, you can define:
- Yandex Managed Service for PostgreSQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Managed Service for PostgreSQL cluster
Warning
To create or edit an endpoint of a managed database, you need to have the managed-postgresql.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for PostgreSQL cluster: Specify ID of the cluster to connect to.
-
Database: Specify the name of the database in the selected cluster.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
postgres-target
.
-
--cluster-id
: ID of the cluster you need to connect to. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
postgres_target
.
-
connection.mdb_cluster_id
: ID of cluster to connect to. -
database
— Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
postgres_target {
security_groups = ["<list_of_security_group_IDs>"]
connection {
mdb_cluster_id = "<cluster_ID>"
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
}
}
}
For more information, see the Terraform provider documentation
-
mdbClusterId
: ID of the cluster you need to connect to. -
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Custom installation
For OnPremise, all fields are filled in manually.
-
Host: Specify the IP address or FQDN of the master host. If the hosts are on different ports, you can specify multiple host values in the
host:port
format. If you choose this format, the value of the Port field will be disregarded. -
Port: Set the number of the port that Data Transfer will use for the connection.
-
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS
requirements. -
Subnet ID: Select or create a subnet in the required availability zone.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Database: Specify the name of the database in the selected cluster.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
postgres-target
.
-
--host
: IP address or FQDN of the master host you want to connect to. -
--port
: Number of the port that Data Transfer will use for the connection. -
--ca-certificate
— If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS . -
--subnet-id
: ID of the subnet the host resides in. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
postgres_target
.
-
security_groups
: Security groups for network traffic.Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.
Security groups must belong to the same network as the
subnet_id
subnet, if the latter is specified.Note
In Terraform, it is not required to specify a network for security groups.
-
on_premise.hosts
: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address. -
on_premise.port
: Port number that Data Transfer will use for connections. -
on_premise.tls_mode.enabled.ca_certificate
: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements. -
on_premise.subnet_id
: ID of the subnet where the host is located. -
database
— Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
postgres_target {
security_groups = ["<list_of_security_group_IDs>"]
connection {
on_premise {
hosts = ["<list_of_hosts>"]
port = <port_for_connection>
}
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
}
}
}
For more information, see the Terraform provider documentation
onPremise
: Database connection parameters:-
hosts
— IP address or FQDN of the master host to connect to. -
port
: The number of the port that Data Transfer will use for the connection. -
tlsMode
: Parameters of encryption of transmitted data if it is required, for example, to meet the requirements of PCI DSS . -
subnetId
: ID of the subnet the host resides in.
-
-
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Additional settings
-
Cleanup policy: Select a way to clean up data in the target database before the transfer:
-
Don't cleanup
: Select this option if you are only going to do replication without copying data. -
Drop
: Completely delete tables included in the transfer (used by default).Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
Truncate
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
-
-
Save transaction borders: Enable so that the service writes data to the target database only after fully reading the transaction data from the source database.
Warning
This feature is at the Preview stage.
cleanup_policy
: Way to clean up data in the target database before the transfer:
-
DISABLED
: Do not clean up (default).Select this option if only replication without copying data is performed.
-
DROP
: Completely delete the tables included in the transfer.Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
TRUNCATE
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
cleanupPolicy
: Way to clean up data in the target database before the transfer:
-
DISABLED
: Do not clean up (default).Select this option if only replication without copying data is performed.
-
DROP
: Completely delete the tables included in the transfer.Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
TRUNCATE
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
After configuring the data source and target, create and start the transfer.
Operations with the database during transfer
Tip
The PostgreSQL replication protocol does not support transferring schema changes. Avoid changing the data schema in the source and target databases during the transfer. If this cannot be avoided, perform explicit checks on the target.
For Snapshot and Snapshot and increment transfers:
-
In the Copying status, changes to the data schema on the source and target are not allowed.
-
In the Replicating status, any changes to the data schema on the source should be manually applied to the target, otherwise the transfer will not be able to continue.
For example, suppose we added a new column to the
test_table
table on the source:ALTER TABLE test_table ADD COLUMN val2 TEXT;
If writing to this table continues, the transfer will not be able to insert data on the target. For replication to continue, run a similar query to change the data schema on the target:
ALTER TABLE test_table ADD COLUMN val2 TEXT;
After that, the transfer can continue.
Troubleshooting data transfer issues
Known issues when using a PostgreSQL endpoint:
- Stopping a transfer's master transaction session.
- Exceeding the connection time-to-live quota.
- VIEW transfer error.
- Error when adding a table entry by constraint.
- Error when migrating all schema tables.
- Low transfer speed.
- Unable to transfer child tables.
- Insufficient replication slots in a source database.
- No data transfer after changing a source endpoint.
- Transfer error when changing a master host.
- Error when transferring nested transactions.
- Error transferring tables with deferred constraints.
- Cannot create a replication slot at the activation step.
- Excessive WAL size increase.
- Error when replicating from an external source.
- Error when transfering tables without primary keys.
- Error when dropping a table under the Drop cleanup policy.
See a full list of recommendations in the Troubleshooting section.
Stopping a transfer's master transaction session
Error message:
Cannot set transaction snapshot:
ERROR: invalid snapshot identifier: "<snapshot_ID>" (SQLSTATE 22023).
Possible causes include:
- The source is running a cron job or other application that periodically terminates sessions that are too long.
- Someone manually terminated the master transaction.
- The source's CPU resources are insufficient to execute a query.
Solution: Disable this cron job and add more CPU resources to the source. After you make the changes, re-activate the transfer.
Exceeding the connection time-to-live quota
Yandex Managed Service for PostgreSQL has a connection time-to-live quota of 12 hours.
Solution: If transferring a database requires more time, [change the Session duration timeout setting of the Yandex Managed Service for PostgreSQL cluster](../../../../managed-postgresql/operations/update.md#change-postgresql-config).
VIEW transfer error
Error message:
[ERROR] "... _view": no key columns found
Can't replicate new data from Views. Under PostgreSQL — PostgreSQL transfers, only those views are transferred which are listed in the Table filter → Included tables source endpoint parameter.
Solution: manually exclude all views from the transfer, list them in the Table filter → Included tables source endpoint parameter, and then reactivate the transfer.
Error when adding a table entry by constraint
Solution: prepare the source as described in Preparing for the transfer.
Error when transferring all schema tables
Error message:
Unable to apply DDL of type 'TABLE', name '<schema>'.'<table>', error:
ERROR: schema "<schema name>" does not exist (SQLSTATE 3F000)
A transfer fails if tables of a specific schema are listed as <schema>.*
. This occurs due to the specifics of pg_dump
that is used for a schema transfer. When specifying tables of the entire schema in <schema>.*
format in the source endpoint parameter Table filter → Included tables, the PostgreSQL types from this schema aren't extracted even if it contains tables that depend on these types.
Solution: Create PostgreSQL types in the target database manually.
Low transfer speed
Snapshot or Snapshot and increment transfers from PostgreSQL to PostgreSQL may slow down.
Possible cause:
-
Write protocol.
Under normal operation, a transfer is performed via the fast
copy
protocol. However, if there's a batch write conflict, it switches to slow line-by-line writes. The more write conflicts, the lower the transfer speed.Solution: set the target endpoint's cleanup policy type to
Drop
and exclude other writing processes. -
Parallel table reading.
Parallel reading is only available for tables that contain a primary key in serial mode
.Solution: Set up parallel copy and reactivate the transfer.
Unable to transfer child tables
Child tables are either not transferred or transferred without data if a table is partitioned.
Solution: set the following source endpoint parameters:
- Select Merge inherited tables in advanced settings.
- In the Included tables field, specify all child tables to transfer data from.
- Make sure the user has access to the child tables.
To speed up the transfer of child tables, set up parallel copy.
Insufficient replication slots in a source database
Error message:
Warn(Activate): failed to create a replication slot "<transfer_ID>" at source:
failed to create a replication slot:
failed to create a replication slot:
ERROR: all replication slots are in use
(SQLSTATE 53400)
Solution: Increase the number of replication slots10
).
No data transfer after changing a source endpoint
After adding tables to the List of included tables in the source endpoint parameters, a data transfer restarted and stopped.
Solution:
-
Create tables in the target database manually.
1. Create new tables with a
Primary key
and without aForeign key
in the target database.
2. Add the new tables to the List of included tables in the source endpoint parameters.
3. Transfer a dump with historical data to the target database.
4. If there are errors in the logs, fix them depending on a specific error.
5. If there are no errors but the logs are empty, contact technical support or your account manager to dump goroutines. This may help resume replication without restarting the transfer. -
Deactivate and reactivate the transfer.
-
Create a separate transfer of the Snapshot type for the new tables. Deactivating the original transfer is not required.
Transfer error when changing a master host
The error occurs in Replication or Snapshot and increment transfers due to required Write-Ahead Log (WAL) parts missing. This occurs when the Write Ahead Log (WAL) logical replication lag between the current master and a replica is greater than the allowable WAL size on other hosts, so, when switching from the master to the replica, the replication slot can't sync to the WAL on the new master.
Solution: set a limit in the additional target endpoint parameter Maximum WAL size for the replication slot and reactivate the transfer.
Error when transferring nested transactions
Transfers of PostgreSQL below version 14 don't support a transfer of tables with committed transactions that are nested more than 1024 times with replication changes at each nesting level. The degree of nesting depends on the number of nested begin; .. commit;
statements.
Solution:
- Use PostgreSQL 14 or higher.
- Exclude transactions with a nesting level like this from a transfer.
Error transferring tables with deferred constraints
The error arises in the transfers of type Replication or Snapshot and increment because updating tables and transactions with DEFERRABLE
constraints isn't supported by Data Transfer. To learn more about deferred constraints, see the PostgreSQL documentation
Solution: Change the constraint type in such tables to IMMEDIATE
and activate the transfer again.
Could not create a replication slot at the activation step
In the beginning of the transfer, one or more replication slots
Solution:
-
Get the
PID
of the process that competes for locks with the transfer:/* Get PID of the transfer */ SELECT active_pid FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'; /* search the PID of the locking process */ SELECT pid, pg_blocking_pids(pid) as blocked_by FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;
pid | blocked_by -----------------+------------------- <transfer_PID> | {<locking_transaction_PID>} (1 row)
-
Look up the locking query:
SELECT query, username FROM pg_stat_activity WHERE pid = <locking_transaction_PID>;
-
(Optional) Stop the transaction by the command:
SELECT pg_terminate_backend(<locking_transaction_PID>);
Excessive WAL size increase
In the PostgreSQL source database, the size of the Write-Ahead Log (WAL) may reach up to dozens of GB due to the queries running for over five minutes. These queries block the WAL and do not allow it to move forward and reset.
You can see the WAL size has gone up when:
- The space used by the source database has increased.
- The Read buffer size chart in Data Transfer monitoring is rising.
Solution:
-
Find the query sessions running for more than five minutes:
SELECT now()-xact_start, query, pid FROM pg_stat_activity WHERE (now()-xact_start)>'5minute'::interval AND STATE != 'idle' ORDER BY 1 DESC;
-
Terminate the sessions found. Make sure to avoid such queries moving forward.
Error when replicating data from an external source
Error message:
[XX000] ERROR: could not connect to the publisher:
SSL error: certificate verify failed FATAL:
no pg_hba.conf entry for replication connection
from host "<PostgreSQL_host_IP_address>", user "postgres", SSL off
Solution: Prepare the source as described in Preparing for the transfer.
Error when transfering tables without primary keys
Error message:
Primary key check failed: 14 Tables errors: Table no key columns found
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.
Solution: Prepare the source as described in Preparing for the transfer.
Error when dropping a table under the Drop cleanup policy
Error message:
ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)
With the Drop
cleanup policy, a transfer drops tables in multiple iterations:
-
The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
-
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
-
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.
Solution:
-
If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
-
Delete the blocking child tables in the target database manually.
-
Use the
Truncate
cleanup policy. -
Recreate the target database.
Warning
This will lead to losing all data in the database.