Transferring data to a PostgreSQL target endpoint
- Scenarios for transferring data to PostgreSQL
- Configuring the data source
- Preparing the target database
- Configuring the PostgreSQL target endpoint
- Operations with the database during transfer
- Troubleshooting data transfer issues
- Stopping a transfer's master transaction session
- Exceeding the connection time-to-live quota
- VIEW transfer error
- Error when adding a table entry by constraint
- Error when transferring all schema tables
- Unable to create objects involving extension functions
- Low transfer speed
- Unable to transfer child tables
- Insufficient replication slots in a source database
- No data transfer after changing a source endpoint
- Transfer error when changing a master host
- Error when transferring nested transactions
- Error transferring tables with deferred constraints
- Could not create a replication slot at the activation step
- Excessive WAL size increase
- Error when replicating data from an external source
- Error when transfering tables without primary keys
- Error when dropping a table under the Drop cleanup policy
Yandex Data Transfer enables you to migrate data to a PostgreSQL database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Prepare the PostgreSQL database for the transfer.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform the required operations with the database and see how the transfer is going.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to PostgreSQL
-
Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
-
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- PostgreSQL
- MySQL®
- Greenplum®
- Apache Kafka®
- Airbyte®
- YDS
- Yandex Object Storage
- Managed Service for YDB
- Oracle.
For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.
Preparing the target database
-
Make sure that the PostgreSQL major version on the target is not lower than that on the source.
-
For transfers from PostgreSQL, enable the same extensions in the target database as in the source database.
If extensions in the source database are installed in a custom schema and are used in the DDLs of the objects you are moving, create DDLs in the target manually. In these DDLs, a function call must not include the schema name. Set the target endpoint cleanup policy to
Truncate
to prevent the transfer from deleting these objects. -
Select the
Drop
cleanup policy for transfer tables.If you have created DDLs in the target manually, use the
Truncate
policy. TheTruncate
policy will not delete these DDLs. -
Create a user with access to the target database.
-
Grant the user all privileges for the database, schemas, and tables to be transferred:
GRANT ALL PRIVILEGES ON DATABASE <database_name> TO <username>;
If the database is not empty, the user must be its owner:
ALTER DATABASE <database_name> OWNER TO <username>;
Once started, the transfer will connect to the target on behalf of this user.
-
If the target has the Save transaction boundaries option enabled, grant the new user all the privileges needed to create the
__data_transfer_lsn
auxiliary table in the current schema (usuallypublic
) in the target:GRANT ALL PRIVILEGES ON SCHEMA <schema_name> TO <username>;
-
Configure the number of user connections to the database.
-
Make sure the settings for the network hosting the cluster allow public connections from IP addresses used by Data Transfer
. -
Make sure that the PostgreSQL major version on the target is not lower than that on the source.
-
In the target database, enable the same extensions that are enabled in the source database.
-
Make sure the target has the
DROP transfer tables
cleanup policy selected. -
Create a user:
CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
-
Grant the user all privileges for the database, schemas, and tables to be transferred:
GRANT ALL PRIVILEGES ON DATABASE <database_name> TO <username>;
If the database is not empty, the user must be its owner:
ALTER DATABASE <database_name> OWNER TO <username>;
Once started, the transfer will connect to the target on behalf of this user.
-
If the target has the Save transaction boundaries option enabled, grant the new user all the privileges needed to create the
__data_transfer_lsn
auxiliary table in the current schema (usuallypublic
) in the target:GRANT ALL PRIVILEGES ON SCHEMA <schema_name> TO <username>;
-
Configure the number of user connections to the database.
Data stored in a MATERIALIZED VIEW
is not transferred. To transfer MATERIALIZED VIEW
data, create an ordinary VIEW
that refers to the MATERIALIZED VIEW
to be transferred.
If the definition of the VIEW
to be transferred contains an invocation of the VOLATILE
functionVIEW
with the READ UNCOMMITTED
isolation level. No consistency between the VIEW
data and the data of other objects being transferred is guaranteed. Reading data from a MATERIALIZED VIEW
in the VIEW
definition are equivalent to invoking the VOLATILE
function.
Configuring the PostgreSQL target endpoint
When creating or updating an endpoint, you can define:
- Yandex Managed Service for PostgreSQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Managed Service for PostgreSQL cluster
Warning
To create or edit an endpoint of a managed database, you need to have the managed-postgresql.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for PostgreSQL cluster: Specify ID of the cluster to connect to.
-
Connection type: Select a database connection option:
-
Self-managed: Allows you to specify connection settings manually:
-
Database: Specify the name of the database in the selected cluster.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
-
Connection Manager: Enables using a database conection via Yandex Connection Manager:
-
Connection: Specify a connection ID from Connection Manager.
-
Database: Specify the name of the database in the selected cluster.
Warning
To use a connection from Connection Manager, the user must have access permissions of
connection-manager.user
or higher for this connection. -
-
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
postgres-target
.
-
--cluster-id
: ID of the cluster you need to connect to. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
postgres_target
.
-
connection.mdb_cluster_id
: ID of cluster to connect to. -
database
— Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
postgres_target {
security_groups = ["<list_of_security_group_IDs>"]
connection {
mdb_cluster_id = "<cluster_ID>"
}
database = "<name_of_database_to_migrate>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
}
}
}
For more information, see the Terraform provider documentation
-
mdbClusterId
: ID of the cluster you need to connect to. -
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Custom installation
For OnPremise, all fields are filled in manually.
-
Connection type: Select a database connection option:
-
Self-managed: Allows you to specify connection settings manually:
-
Host: Specify the IP address or FQDN of the master host. If hosts have different ports open for connection, you can specify multiple host values in
host:port
format. If you choose this format, the value of the Port field will be disregarded. -
Port: Set the number of the port that Data Transfer will use for the connection.
-
Database: Specify the name of the database in the selected cluster.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS
requirements.
-
-
Connection Manager: Enables using a database conection via Yandex Connection Manager:
-
Connection: Specify a connection ID from Connection Manager.
-
Database: Specify the name of the database in the selected cluster.
Warning
To use a connection from Connection Manager, the user must have access permissions of
connection-manager.user
or higher for this connection. -
-
-
Subnet ID: Select or create a subnet in the required availability zone.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
postgres-target
.
-
--host
: IP address or FQDN of the master host you want to connect to. -
--port
: Number of the port that Data Transfer will use for the connection. -
--ca-certificate
— If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS . -
--subnet-id
: ID of the subnet the host resides in. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
postgres_target
.
-
security_groups
: Security groups for network traffic.Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.
Security groups must belong to the same network as the
subnet_id
subnet, if the latter is specified.Note
In Terraform, it is not required to specify a network for security groups.
-
on_premise.hosts
: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address. -
on_premise.port
: Port number that Data Transfer will use for connections. -
on_premise.tls_mode.enabled.ca_certificate
: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements. -
on_premise.subnet_id
: ID of the subnet where the host is located. -
database
— Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
postgres_target {
security_groups = ["<list_of_security_group_IDs>"]
connection {
on_premise {
hosts = ["<list_of_hosts>"]
port = <pot_for_connection>
}
}
database = "<name_of_database_to_migrate>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
}
}
}
For more information, see the Terraform provider documentation
onPremise
: Database connection parameters:-
hosts
— IP address or FQDN of the master host to connect to. -
port
: The number of the port that Data Transfer will use for the connection. -
tlsMode
: Parameters of encryption of transmitted data if it is required, for example, to meet the requirements of PCI DSS . -
subnetId
: ID of the subnet the host resides in.
-
-
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Additional settings
-
Cleanup policy: Select a way to clean up data in the target database before the transfer:
-
Don't cleanup
: Select this option if you are only going to do replication without copying data. -
Drop
: Completely delete tables included in the transfer (used by default).Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
Truncate
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
-
-
Save transaction borders: Enable so that the service writes data to the target database only after fully reading the transaction data from the source database.
Warning
This feature is at the Preview stage.
cleanup_policy
: Way to clean up data in the target database before the transfer:
-
DISABLED
: Do not clean up (default).Select this option if only replication without copying data is performed.
-
DROP
: Completely delete the tables included in the transfer.Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
TRUNCATE
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
cleanupPolicy
: Way to clean up data in the target database before the transfer:
-
DISABLED
: Do not clean up (default).Select this option if only replication without copying data is performed.
-
DROP
: Completely delete the tables included in the transfer.Use this option so that the latest version of the table schema is always transferred to the target database from the source whenever the transfer is activated.
-
TRUNCATE
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
After configuring the data source and target, create and start the transfer.
Operations with the database during transfer
Tip
The PostgreSQL replication protocol does not support transferring schema changes. Avoid changing the data schema in the source and target databases during the transfer. If this cannot be avoided, perform explicit checks on the target.
For Snapshot and Snapshot and increment transfers:
-
In the Copying status, changes to the data schema on the source and target are not allowed.
-
In the Replicating status, any changes to the data schema on the source should be manually applied to the target, otherwise the transfer will not be able to continue.
For example, suppose we added a new column to the
test_table
table on the source:ALTER TABLE test_table ADD COLUMN val2 TEXT;
If writing to this table continues, the transfer will not be able to insert data on the target. For replication to continue, run a similar query to change the data schema on the target:
ALTER TABLE test_table ADD COLUMN val2 TEXT;
After that, the transfer can continue.
Troubleshooting data transfer issues
Known issues when using a PostgreSQL endpoint:
- Stopping a transfer's master transaction session.
- Exceeding the connection time-to-live quota.
- VIEW transfer error.
- Error when adding a table entry by constraint.
- Error when migrating all schema tables.
- Unable to create objects involving extension functions.
- Low transfer speed.
- Unable to transfer child tables.
- Insufficient replication slots in a source database.
- No data transfer after changing a source endpoint.
- Transfer error when changing a master host.
- Error when transferring nested transactions.
- Error transferring tables with deferred constraints.
- Cannot create a replication slot at the activation step.
- Excessive WAL size increase.
- Error when replicating from an external source.
- Error when transferring tables without primary keys.
- Error when dropping a table under the Drop cleanup policy.
For more troubleshooting tips, see the Troubleshooting section.
Stopping a transfer's master transaction session
Error message:
Cannot set transaction snapshot:
ERROR: invalid snapshot identifier: "<snapshot_ID>" (SQLSTATE 22023).
Possible causes include:
- The source is running a cron job or other application that periodically terminates sessions that are too long.
- Someone manually terminated the master transaction.
- The source's CPU resources are insufficient to execute a query.
Solution: Disable this cron job and add more CPU resources to the source. After you make the changes, re-activate the transfer.
Exceeding the connection time-to-live quota
Yandex Managed Service for PostgreSQL has a connection time-to-live quota of 12 hours.
Solution: If transferring a database requires more time, [change the Session duration timeout setting of the Yandex Managed Service for PostgreSQL cluster](../../../../managed-postgresql/operations/update.md#change-postgresql-config).
VIEW transfer error
Error message:
[ERROR] "... _view": no key columns found
Can't replicate new data from Views. Under PostgreSQL — PostgreSQL transfers, only those views are transferred which are listed in the Table filter → Included tables source endpoint parameter.
Solution: manually exclude all views from the transfer, list them in the Table filter → Included tables source endpoint parameter, and then reactivate the transfer.
Error when adding a table entry by constraint
Solution: prepare the source as described in Preparing for the transfer.
Error when transferring all schema tables
Error message:
Unable to apply DDL of type 'TABLE', name '<schema>'.'<table>', error:
ERROR: schema "<schema name>" does not exist (SQLSTATE 3F000)
A transfer fails if tables of a specific schema are listed as <schema>.*
. This occurs due to the specifics of pg_dump
that is used for a schema transfer. When specifying tables of the entire schema in <schema>.*
format in the source endpoint parameter Table filter → Included tables, the PostgreSQL types from this schema aren't extracted even if it contains tables that depend on these types.
Solution: Create PostgreSQL types in the target database manually.
Unable to create objects involving extension functions
Error message:
Unable to apply DDL of type 'TABLE', <object_name>, error:
failed to push non-row item 0 of kind "pg:DDL" in batch:
Push failed: ERROR: function <schema_name>.<function_name>() does not exist
(SQLSTATE 42883)
In Managed Service for PostgreSQL, you cannot install an extension into a custom schema in a target database. This means the transfer will be interrupted if the user installation of Managed Service for PostgreSQL has extensions installed in a custom schema and these extensions are used in the DDLs of objects you are moving.
Solution: check the DDLs of objects whose names appear in the error. If these objects can call a function from a custom schema, manually create DDLs in the target that call functions without specifying a schema. Set the target endpoint cleanup policy to Truncate
to prevent the transfer from deleting these objects.
Low transfer speed
Snapshot or Snapshot and increment transfers from PostgreSQL to PostgreSQL may slow down.
Possible cause:
-
Write protocol.
Under normal operation, a transfer is performed via the fast
copy
protocol. However, if there's a batch write conflict, it switches to slow line-by-line writes. The more write conflicts, the lower the transfer speed.Solution: set the target endpoint's cleanup policy type to
Drop
and exclude other writing processes. -
Parallel table reading.
Parallel reading is only available for tables that contain a primary key in serial mode
.Solution: Set up parallel copy and reactivate the transfer.
Unable to transfer child tables
Child tables are either not transferred or transferred without data if a table is partitioned.
Solution: set the following source endpoint parameters:
- Select Merge inherited tables in advanced settings.
- In the Included tables field, specify all child tables to transfer data from.
- Make sure the user has access to the child tables.
To speed up the transfer of child tables, set up parallel copy.
Insufficient replication slots in a source database
Error message:
Warn(Activate): failed to create a replication slot "<transfer_ID>" at source:
failed to create a replication slot:
failed to create a replication slot:
ERROR: all replication slots are in use
(SQLSTATE 53400)
Solution: Increase the number of replication slots10
).
No data transfer after changing a source endpoint
After adding tables to the List of included tables in the source endpoint parameters, a data transfer restarted and stopped.
Solution:
-
Create tables in the target database manually.
1. Create new tables with a
Primary key
and without aForeign key
in the target database.
2. Add the new tables to the List of included tables in the source endpoint parameters.
3. Transfer a dump with historical data to the target database.
4. If there are errors in the logs, fix them depending on a specific error.
5. If there are no errors but the logs are empty, contact technical support or your account manager to dump goroutines. This may help resume replication without restarting the transfer. -
Deactivate and reactivate the transfer.
-
Create a separate transfer of the Snapshot type for the new tables. Deactivating the original transfer is not required.
Transfer error when changing a master host
The error occurs in Replication or Snapshot and increment transfers due to required Write-Ahead Log (WAL) parts missing. This occurs when the Write Ahead Log (WAL) logical replication lag between the current master and a replica is greater than the allowable WAL size on other hosts, so, when switching from the master to the replica, the replication slot can't sync to the WAL on the new master.
Solution: set a limit in the additional target endpoint parameter Maximum WAL size for the replication slot and reactivate the transfer.
Error when transferring nested transactions
Transfers of PostgreSQL below version 14 don't support a transfer of tables with committed transactions that are nested more than 1024 times with replication changes at each nesting level. The degree of nesting depends on the number of nested begin; .. commit;
statements.
Solution:
- Use PostgreSQL 14 or higher.
- Exclude transactions with a nesting level like this from a transfer.
Error transferring tables with deferred constraints
The error arises in the transfers of type Replication or Snapshot and increment because updating tables and transactions with DEFERRABLE
constraints isn't supported by Data Transfer. To learn more about deferred constraints, see the PostgreSQL documentation
Solution: Change the constraint type in such tables to IMMEDIATE
and activate the transfer again.
Could not create a replication slot at the activation step
In the beginning of the transfer, one or more replication slots
Solution:
-
Get the
PID
of the process that competes for locks with the transfer:/* Get PID of the transfer */ SELECT active_pid FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'; /* search the PID of the locking process */ SELECT pid, pg_blocking_pids(pid) as blocked_by FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;
pid | blocked_by -----------------+------------------- <transfer_PID> | {<locking_transaction_PID>} (1 row)
-
Look up the locking query:
SELECT query, username FROM pg_stat_activity WHERE pid = <locking_transaction_PID>;
-
(Optional) Stop the transaction by the command:
SELECT pg_terminate_backend(<locking_transaction_PID>);
Excessive WAL size increase
In the PostgreSQL source database, the size of the Write-Ahead Log (WAL) may reach up to dozens of GB due to the queries running for over five minutes. These queries block the WAL and do not allow it to move forward and reset.
You can see the WAL size has gone up when:
- The space used by the source database has increased.
- The Read buffer size chart in Data Transfer monitoring is rising.
Solution:
-
Find the query sessions running for more than five minutes:
SELECT now()-xact_start, query, pid FROM pg_stat_activity WHERE (now()-xact_start)>'5minute'::interval AND STATE != 'idle' ORDER BY 1 DESC;
-
Terminate the sessions found. Make sure to avoid such queries moving forward.
Error when replicating data from an external source
Error message:
[XX000] ERROR: could not connect to the publisher:
SSL error: certificate verify failed FATAL:
no pg_hba.conf entry for replication connection
from host "<PostgreSQL_host_IP_address>", user "postgres", SSL off
Solution: Prepare the source as described in Preparing for the transfer.
Error when transfering tables without primary keys
Error message:
Primary key check failed: 14 Tables errors: Table no key columns found
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.
Solution: Prepare the source as described in Preparing for the transfer.
Error when dropping a table under the Drop cleanup policy
Error message:
ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)
With the Drop
cleanup policy, a transfer drops tables in multiple iterations:
-
The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
-
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
-
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.
Solution:
-
If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
-
Delete the blocking child tables in the target database manually.
-
Use the
Truncate
cleanup policy. -
Recreate the target database.
Warning
This will lead to losing all data in the database.