Transferring data from a PostgreSQL source endpoint

Written by

Improved by

Updated at July 15, 2025

Scenarios for transferring data from PostgreSQL
Preparing the source database
Configuring the PostgreSQL source endpoint
Configuring the data target
Operations with the database during transfer
Troubleshooting data transfer issues

Yandex Data Transfer enables you to migrate data from a PostgreSQL database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

Explore possible data transfer scenarios.
Prepare the PostgreSQL database for the transfer.
Set up a source endpoint in Yandex Data Transfer.
Set up one of the supported data targets.
Create a transfer and start it.
Perform the required operations with the database and see how the transfer is going.
In case of any issues, use ready-made solutions to resolve them.

Scenarios for transferring data from PostgreSQL

Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.
Data change capture means tracking changes to a database and delivering those changes to consumers. It is used for applications that are sensitive to real-time data changes.
- Capturing changes from PostgreSQL and delivering to YDS.
- Capturing changes from PostgreSQL and delivering to Apache Kafka®.
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
- Loading data from PostgreSQL to the ClickHouse® data mart.
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.
- Loading PostgreSQL data to the Object Storage scalable storage.

For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.

Preparing the source database

Note

When performing a transfer from PostgreSQL to a target of any type, objects of the large object type will not get transferred.

When transferring data of the TIMESTAMP WITHOUT TIME ZONE type, the time zone set in the timezone parameter of the PostgreSQL source database applies.

Example

The column with the TIMESTAMP WITHOUT TIME ZONE data type contains the 1970-01-01 00:00:00 value. The way the value will be transferred depends on the timezone parameter in the database:

If the parameter is set to Etc/UTC, the time will be transferred as 1970-01-01 00:00:00+00.
If the parameter is set to Europe/Moscow, the time will be transferred as 1970-01-01 00:00:00+03.

Data stored in a MATERIALIZED VIEW is not transferred. To transfer MATERIALIZED VIEW data, create an ordinary VIEW that refers to the MATERIALIZED VIEW to be transferred.

If the definition of the VIEW to be transferred contains an invocation of the VOLATILE function, the transfer reads data from this VIEW with the READ UNCOMMITTED isolation level. No consistency between the VIEW data and the data of other objects being transferred is guaranteed. Reading data from a MATERIALIZED VIEW in the VIEW definition are equivalent to invoking the VOLATILE function.

Large objects in the TOAST storage system and those of the bytea type get transferred without restrictions.

Managed Service for PostgreSQL

PostgreSQL

Configure the user account the transfer will use to connect to the source:
1. Create a user.
2. For Replication and Snapshot and increment transfer types, assign the mdb_replication role to this user.
3. Connect to the database you want to migrate as the database owner and configure privileges:
  - SELECT for all the database tables within the transfer.
  - SELECT for all the database sequences within the transfer.
  - USAGE for the schemas of those tables and sequences.
  - ALL PRIVILEGES (CREATE and USAGE) for the service table (__consumer_keeper and __data_transfer_mole_finder) schema defined by the endpoint parameter if the endpoint is going to be used for the Replication or Snapshot and increment transfer types.
Configure the number of user connections to the database.
If the replication source is a cluster, enable the pg_tm_aux extension for it. This will allow replication to continue even after changing the master host. In some cases, a transfer may end in an error after you replace a master in your cluster. For more information, see Troubleshooting.
To transfer tables without primary keys for the Replication and Snapshot and increment transfer types, you must add the REPLICA IDENTITY:
- Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
- Add the replica ID to tables without primary keys:
  - For tables with an index, set REPLICA IDENTITY by unique key:
```
ALTER TABLE MY_TBL REPLICA IDENTITY USING INDEX MY_IDX;
```
  - For tables without an index, change REPLICA IDENTITY:
```
ALTER TABLE MY_TBL REPLICA IDENTITY FULL;
```
    In this case, Data Transfer will treat such tables as tables where the primary key is a composite key that includes all columns of the table.
If there are no primary keys in a table, logical replication will not include any changes in the rows (UPDATE or DELETE).
- If, while trying to transfer tables from PostgreSQL to PostgreSQL, you do not exclude a table without primary keys on the transfer source, the following error message will be returned:
```
 failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
```
- If, while trying to transfer tables from PostgreSQL to a different database, you add a table without primary keys, the following error message will be returned:
```
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed:
"public"."MY_TBL": no key columns found
```
Disable the transfer of external keys when creating a source endpoint. Recreate them once the transfer is completed.
If a database contains tables with generated columns, such tables will not be migrated and the transfer will end with an error. For more information, see Troubleshooting. To make sure the transfer is running properly when migrating a database with such tables, add them to the list of excluded tables in the source endpoint settings.
Find and terminate DDL queries that are running for too long. To do this, make a selection from the PostgreSQL pg_stat_activity system table:
```
SELECT NOW() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle' ORDER BY 1 DESC;
```
This will return a list of queries running on the server. Pay attention to queries with a high duration value.
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the PostgreSQL source.
To enable parallel data reads from the table by ranges, make sure its primary key is specified. Then specify the number of workers and threads in the transfer parameters under Runtime environment.
Configure WAL monitoring.

Replication and Snapshot and increment transfers use logical replication. To perform the replication, the transfer creates a replication slot where slot_name matches the transfer ID, which you can get by selecting the transfer in the list of your transfers. Your WAL may grow due to different reasons: a long-running transaction or a transfer issue. Therefore, we recommend you to configure WAL monitoring on the source side.
1. To monitor storage or disk space usage, use monitoring tools to set up an alert (see the disk.used_bytes description).
2. Set the maximum WAL size for replication in the Max slot wal keep size setting. The value of this setting can be edited as of PostgreSQL version 13. To urgently disable a transfer to perform data reads, delete the replication slot.
  
  Warning
  
  If set to -1 (unlimited size), you will not be able to delete WAL files due to open logical replication slots the information is not read from. As a result, the WAL files will take up the entire disk space and you will not be able to connect to the cluster.
3. Set up an alert with the Yandex Monitoring tools for the metric used for Total size of WAL files. Make sure the threshold values are less than those specified for the disk.used_bytes metric because, apart from the data, the disk stores temporary files, the WAL, and other types of data. You can monitor the current slot size by running this DB query with the correct slot_name, which matches the transfer ID:
```
SELECT slot_name, pg_size_pretty(pg_current_wal_lsn() - restart_lsn), active_pid, catalog_xmin, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'
```

If not planning to use Cloud Interconnect or VPN for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer.

For details on linking your network up with external resources, see this concept.
Configure the user account the transfer will use to connect to the source:
1. Create a new user:
  - For the Snapshot transfer type, create a user with the following command:
```
CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
```
  - For Replication and Snapshot and increment transfers, create a user with the REPLICATION privilege by running this command:
```
CREATE ROLE <username> WITH REPLICATION LOGIN ENCRYPTED PASSWORD '<password>';
```
2. Grant the new user the SELECT privilege for all the database tables involved in the transfer:
```
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <username>;
```
3. Grant the created user a privilege on all the transferred DB schemas:
  - For the Snapshot transfer type grant the USAGE privilege:
```
GRANT USAGE ON SCHEMA <schema_name> TO <username>;
```
  - For the Replication and Snapshot and increment transfer types grant the CREATE and USAGE (ALL PRIVILEGES) privileges required for creating service tables:
```
GRANT ALL PRIVILEGES ON SCHEMA <schema_name> TO <username>;
```
4. Grant the new user the SELECT privilege for all the database table sequences involved in the transfer:
```
GRANT SELECT ON ALL SEQUENCES IN SCHEMA <schema_name> TO <username>;
```
5. Grant the new user the CONNECT privilege if the source cluster default settings do not allow connections for new users:
```
GRANT CONNECT ON DATABASE <database_name> TO <username>;
```
Set up PostgreSQL configuration:
1. Make changes to the source cluster configuration and authentication settings. To do this, edit the postgresql.conf and pg_hba.conf files (on Debian and Ubuntu, they reside in the /etc/postgresql/<PostgreSQL_version>/main/ directory by default):
  1. Set the maximum number of user connections. To do this, edit the max_connections parameter in postgresql.conf:
```
max_connections = <number_of_connections>
```
    Where <number_of_connections> is the maximum number of connections. For more information on how to configure this parameter, see Specifics of working with endpoints.
    
    In the pg_stat_activity view, you can see the current number of connections:
```
SELECT count(*) FROM pg_stat_activity;
```
  2. Set the logging level for Write Ahead Log (WAL). To do this, set the wal_level value to logical in postgresql.conf:
```
wal_level = logical
```
  3. Optionally, configure SSL to not only encrypt data but also compress it. To enable SSL, set the appropriate value in postgresql.conf:
```
ssl = on
```
  4. Enable connections to the cluster. To do this, edit the listen_addresses parameter in postgresql.conf. For example, you can enable the source cluster to accept connection requests from all IP addresses:
```
listen_addresses = '*'
```
  5. Set up authentication in the pg_hba.conf file:
    SSL
    
    Without SSL
    hostssl all all <connection_IP_address> md5 hostssl replication all <connection_IP_address> md5
    host all all <connection_IP_address> md5 host replication all <connection_IP_address> md5
    Where <connection_IP_address> can be either an exact IP address or a range of IP addresses. For example, to allow access from the Yandex Cloud network, you can specify all public IP addresses in Yandex Cloud.
2. If a firewall is enabled in the source cluster, allow incoming connections from the relevant addresses.
3. To apply the settings, restart PostgreSQL:
```
sudo systemctl restart postgresql
```
4. Check the PostgreSQL status after restarting:
```
sudo systemctl status postgresql
```
Install and enable the wal2json extension.
- Linux
  1. Add the PostgreSQL official repository for your distribution.
  2. Update the list of available packages and install wal2json for your PostgreSQL version.
- Windows 10, 11
  1. If you do not have Microsoft Visual Studio installed yet, download and install it. To build the wal2json extension, the Community Edition is sufficient. During installation, select the following components:
    - MSBuild
    - MSVC v141 x86/x64 build tools
    - C++\CLI support for v141 build tools
    - MSVC v141 - VS 2017 C++ x64\x86 build tools
    - MSVC v141 - VS 2017 C++ x64\x86 Spectre-mitigated libs
    - The latest version of the Windows SDK for the active OS version
    - Other dependencies that are installed automatically for selected components
    Take note of the version number of the installed Windows SDK. You will need it while setting the wal2json build parameters.
  2. Download the wal2json source code from the project page.
  3. Unpack the archive with the source code to the C:\wal2json\ folder.
  4. Navigate to C:\wal2json.
  5. Within one PowerShell session, make changes to the wal2json.vcxproj file as follows:
    - Replace the C:\postgres\pg103 lines with the path to the folder housing your installed PostgreSQL version, for example:
      (Get-Content .\wal2json.vcxproj).replace('C:\postgres\pg103', 'C:\PostgreSQL\14') | ` Set-Content .\wal2json.vcxproj
    - Replace the /MP build parameter with /MT, for example:
      (Get-Content .\wal2json.vcxproj).replace('/MP', '/MT') | Set-Content .\wal2json.vcxproj
    - Specify the version number of the installed Windows SDK in <WindowsTargetPlatformVersion>:
      (Get-Content .\wal2json.vcxproj).replace('<WindowsTargetPlatformVersion>8.1', '<WindowsTargetPlatformVersion><installed_Windows_SDK_version>') | ` Set-Content .\wal2json.vcxproj
    1. Enter the value of the extension variable required for building wal2json. For example, for Visual Studio Community Edition 2022:
      $VCTargetsPath='C:\Program Files\Microsoft Visual Studio\2022\Comminuty\MSBuild\Microsoft\VC\v150'
    2. Run the build:
      & 'C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe' /p:Configuration=Release /p:Platform=x64
    3. Copy the wal2json.dll file from the build/release folder to the lib folder of your PostgreSQL version.
If the replication source is a cluster, install and enable the pg_tm_aux extension on its hosts. This will allow replication to continue even after changing the master host. In some cases, a transfer may end in an error after you replace a master in your cluster. For more information, see Troubleshooting.
To transfer tables without primary keys for the Replication and Snapshot and increment transfer types, you must add the REPLICA IDENTITY:
- Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
- Add the replica ID to tables without primary keys:
  - For tables with an index, set REPLICA IDENTITY by unique key:
```
ALTER TABLE MY_TBL REPLICA IDENTITY USING INDEX MY_IDX;
```
  - For tables without an index, change REPLICA IDENTITY:
```
ALTER TABLE MY_TBL REPLICA IDENTITY FULL;
```
    In this case, Data Transfer will treat such tables as tables where the primary key is a composite key that includes all columns of the table.
If there are no primary keys in a table, logical replication will not include any changes in the rows (UPDATE or DELETE).
- If, while trying to transfer tables from PostgreSQL to PostgreSQL, you do not exclude a table without primary keys on the transfer source, the following error message will be returned:
```
 failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
```
- If, while trying to transfer tables from PostgreSQL to a different database, you add a table without primary keys, the following error message will be returned:
```
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed:
"public"."MY_TBL": no key columns found
```
Disable the transfer of external keys when creating a source endpoint. Recreate them once the transfer is completed.
If a database contains tables with generated columns, such tables will not be migrated and the transfer will end with an error. For more information, see Troubleshooting. To make sure the transfer is running properly when migrating a database with such tables, add them to the list of excluded tables in the source endpoint settings.
Find and terminate DDL queries that are running for too long. To do this, make a selection from the PostgreSQL pg_stat_activity system table:
```
SELECT NOW() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle' ORDER BY 1 DESC;
```
This will return a list of queries running on the server. Pay attention to queries with a high duration value.
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the PostgreSQL source.
To enable parallel data reads from the table by ranges, make sure its primary key is specified. Then specify the number of workers and threads in the transfer parameters under Runtime environment.
If replication via Patroni is configured on the source, add an ignore_slots section to the source configuration:
```
ignore_slots:
  - database: <database>
    name: <replication_slot>
    plugin: wal2json
    type: logical
```
Where:
- database: Name of the database the transfer is configured for.
- name: Replication slot name.
The database and the replication slot names must match the values specified in the source endpoint settings. By default, the replication slot name is the same as the transfer ID.

Otherwise, the start of the replication phase will fail:
```
Warn(Termination): unable to create new pg source: Replication slotID <replication_slot_name> does not exist.
```
Configure WAL monitoring. Replication and Snapshot and increment transfers use logical replication. To perform the replication, the transfer creates a replication slot where slot_name matches the transfer ID, which you can get by selecting the transfer in the list of your transfers. Your WAL may grow due to different reasons: a long-running transaction or a transfer issue. Therefore, we recommend you to configure WAL monitoring on the source side.
1. Set up alerts as described in the disk usage recommendations.
2. Set the maximum WAL size. This feature is available starting with PostgreSQL version 13.
3. You can monitor the current slot size by running this DB query with the correct slot_name, which matches the transfer ID:
```
SELECT slot_name, pg_size_pretty(pg_current_wal_lsn() - restart_lsn), active_pid, catalog_xmin, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'
```

Note

For things to note about data transfer from PostgreSQL to ClickHouse® using Replication and Snapshot and increment transfers, see Asynchronously replicating data from PostgreSQL to ClickHouse®.

Configuring the PostgreSQL source endpoint

When creating or updating an endpoint, you can define:

Yandex Managed Service for PostgreSQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
Additional parameters.

Before you get started, check the Service specifics for PostgreSQL sources and targets.

Managed Service for PostgreSQL cluster

Warning

To create or edit an endpoint of a managed database, you need to have the managed-postgresql.viewer role or the viewer primitive role assigned for the folder where this managed database cluster resides.

Connecting to the database with the cluster ID specified in Yandex Cloud.

Management console

CLI

Terraform

API

Connection type: Select a database connection option:
- Self-managed: Allows you to specify connection settings manually.
  
  Select Managed Service for PostgreSQL cluster as the installation type and configure these settings as follows:
  - Managed Service for PostgreSQL cluster: Specify the ID of the cluster you need to connect to.
  - Database: Specify the name of the database in the selected cluster.
  - User: Specify the username that Data Transfer will use to connect to the database.
  - Password: Enter the user's password to the database.
- Connection Manager: Enables using a managed database connection using Yandex Connection Manager:
  - Select the folder with the Managed Service for PostgreSQL cluster.
  - Select Managed DB cluster as the installation type and configure the following settings:
    - Managed Service for PostgreSQL cluster: Specify the ID of the cluster you need to connect to.
    - Connection: Specify the managed connection ID in Connection Manager.
    - Database: Specify the name of the database in the selected cluster.
  Warning
  
  To use a connection from Connection Manager, the user must have access permissions for this connection of connection-manager.user or higher.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Endpoint type: postgres-source.

--cluster-id: ID of the cluster you need to connect to.
--database: Database name.
--user: Username that Data Transfer will use to connect to the database.
To set a user password to access the DB, use one of the following parameters:
- --raw-password: Password as text.
- --password-file: The path to the password file.

Endpoint type: postgres_source.

connection.mdb_cluster_id: ID of cluster to connect to.
database — Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Password in text form.

Here is an example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    postgres_source {
      security_groups = ["<list_of_security_group_IDs>"]
      connection {
        mdb_cluster_id = "<cluster_ID>"
      }
      database = "<name_of_database_to_migrate>"
      user     = "<username_for_connection>"
      password {
        raw = "<user_password>"
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

mdbClusterId: ID of the cluster you need to connect to.
database: Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Database user password (in text form).

Custom installation

For OnPremise, all fields are filled in manually.

Management console

CLI

Terraform

API

Connection type: Select a database connection option:
- Self-managed: Allows you to specify connection settings manually.
  
  Select Custom installation as the installation type and configure the following settings:
  - Host: Specify the IP address or FQDN of the master host. If hosts have different ports open for connection, you can specify multiple host values in host:port format. If you choose this format, the value of the Port field will be disregarded.
  - Port: Set the number of the port that Data Transfer will use for the connection.
  - Database: Specify the database name in the custom installation.
  - User: Specify the username that Data Transfer will use to connect to the database.
  - Password: Enter the user's password to the database.
  - CA certificate: Upload the certificate file or add its contents as text if you need to encrypt the data to transfer, e.g., for compliance with the PCI DSS requirements.
    
    Warning
    
    If no certificate is added, the transfer may fail with an error.
  - Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the cluster.
    
    If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
- Connection Manager: Enables using a managed database connection using Yandex Connection Manager:
  - Select the folder where the Connection Manager managed connection was created.
  - Select Custom installation as the installation type and configure the following settings:
    - Connection: Specify the managed connection ID in Connection Manager.
    - Database: Specify the database name in the custom installation.
    - Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the cluster.
      
      If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
  Warning
  
  To use a connection from Connection Manager, the user must have access permissions for this connection of connection-manager.user or higher.
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.

Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.

Endpoint type: postgres-source.

--host: IP address or FQDN of the master host you want to connect to.
--port: Number of the port that Data Transfer will use for the connection.
--ca-certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.

Warning

If no certificate is added, the transfer may fail with an error.
--subnet-id: ID of the subnet the host is in. The transfer will use that subnet to access the host.
--database: Database name.
--user: Username that Data Transfer will use to connect to the database.
To set a user password to access the DB, use one of the following parameters:
- --raw-password: Password as text.
- --password-file: The path to the password file.

Endpoint type: postgres_source.

security_groups: Security groups for network traffic.

Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.

Security groups must belong to the same network as the subnet_id subnet, if the latter is specified.

Note

In Terraform, it is not required to specify a network for security groups.
on_premise.hosts: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address.
on_premise.port: Port number that Data Transfer will use for connections.
on_premise.tls_mode.enabled.ca_certificate: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.

Warning

If no certificate is added, the transfer may fail with an error.
on_premise.subnet_id: ID of the subnet the host is in. The transfer will use that subnet to access the host.
database — Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Password in text form.

Here is an example of the configuration file structure:

resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
  name = "<endpoint_name>"
  settings {
    postgres_source {
      security_groups = ["<list_of_security_group_IDs>"]
      connection {
        on_premise {
          hosts = ["<list_of_hosts>"]
          port  = <port_for_connection>
        }
      }
      database = "<name_of_database_to_migrate>"
      user     = "<username_for_connection>"
      password {
        raw = "<user_password>"
      }
      <additional_endpoint_settings>
    }
  }
}

For more information, see the Terraform provider documentation.

onPremise: Database connection parameters:
- hosts — IP address or FQDN of the master host to connect to.
- port: The number of the port that Data Transfer will use for the connection.
- tlsMode: Parameters for encrypting the data to transfer, if required, e.g., for compliance with the PCI DSS requirements.
  - disabled: Disabled.
  - enabled: Enabled.
    - caCertificate: CA certificate.
      
      Warning
      
      If no certificate is added, the transfer may fail with an error.
- subnetId: ID of the subnet the host is in. The transfer will use that subnet to access the host.
database: Database name.
user: Username that Data Transfer will use to connect to the database.
password.raw: Database user password (in text form).

Additional settings

Management console

CLI

Terraform

API

Table filter:
- Included tables: Only data from the tables listed here will be transferred.
  
  When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
- Excluded tables: Data from the listed tables is not transferred.
The lists include the name of the schema that describes the DB contents, structure, and integrity constraints, as well as the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Full table name.
- <schema_name>.*: All tables in the specified schema.
Table names must match this regular expression:
```
^"?[-_a-zA-Z0-9.]+"?\\."?[$-_a-zA-Z0-9.*]+"?$
```
Double quotes within a table name are not supported. Outer quotes are only used as delimiters and will be deleted when processing paths.

Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
Schema migration: If required, select the DB schema elements to transfer when activating or deactivating a transfer.
Advanced settings:
- Maximum WAL size for a replication slot: Maximum size of write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB. This setting does not prevent disk overflow in the source database. You can only use it for PostgreSQL version below 13, and we recommend monitoring the WAL slot value in the source database.
  
  Warning
  
  The Maximum WAL size for a replication slot setting does not guarantee replication slot deletion when its threshold value is reached, particularly if there are issues with the transfer or connectivity between the transfer and the source cluster. For additional tips, see Preparing the source database.
- Database schema for auxiliary tables: Specify the name of the schema to store service tables (__consumer_keeper and __data_transfer_mole_finder).
The schema name must match this regular expression:
```
^[-_a-zA-Z0-9]*$
```
Double quotes are not supported in schema names.
- Merge inherited tables: Select to merge the contents of tables. For more information, see Specifics of working with endpoints.
- Parallel table copying settings: If required, specify detailed settings for parallel copying of tables (if parallel copying parameters are set in the transfer).

--include-table: List of included tables. Only the data from the tables listed here will be transferred.

When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
--exclude-table: List of excluded tables. Data from the listed tables will not be transferred.

The lists include the name of the schema that describes the DB contents, structure, and integrity constraints, as well as the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Full table name.
- <schema_name>.*: All tables in the specified schema.
Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
--slot-lag-limit: Maximum size of write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB.
--service-schema: Name of the DB schema for service tables.
Schema transfer settings:
- --transfer-before-data: At the initial stage of the transfer.
- --transfer-after-data: At the final stage of the transfer.

include_tables: List of included tables. Only the data from the tables listed here will be transferred.

When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
exclude_tables: List of excluded tables. Data from the listed tables will not be transferred.

The lists include the name of the schema that describes the DB contents, structure, and integrity constraints, as well as the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Full table name.
- <schema_name>.*: All tables in the specified schema.
Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
slot_gigabyte_lag_limit: Maximum size of write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB.
service_schema: Name of the DB schema for service tables.
object_transfer_settings: Schema transfer settings:
- sequence: Sequences.
- sequence_owned_by: Custom sequences.
- table: Tables.
- primary_key: Primary keys.
- fk_constraint: Foreign keys.
- default_values: Default values.
- constraint: Constraints.
- index: Indexes.
- view: Views.
- function: Functions.
- trigger: Triggers.
- type: Types.
- rule: Rules.
- collation: Sorting rules.
- policy: Policies.
- cast: Type casts.
You can specify one of the following values for each entity:
- BEFORE_DATA: Transferring when activating the transfer.
- AFTER_DATA: Transferring when deactivating the transfer.
- NEVER: No tansfer.

For more information, see the Terraform provider documentation.

includeTables: List of included tables. Only the data from the tables listed here will be transferred.

When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
excludeTables: List of excluded tables. Data from the listed tables will not be transferred.

The lists include the name of the schema that describes the DB contents, structure, and integrity constraints, as well as the table name. Both lists support expressions in the following format:
- <schema_name>.<table_name>: Full table name.
- <schema_name>.*: All tables in the specified schema.
Warning

The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
slotByteLagLimit: Maximum size of write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB.
serviceSchema: Name of the DB schema for service tables.
objectTransferSettings: Settings for transferring the schema at the initial and final stages of the transfer (BEFORE_DATA and AFTER_DATA, respectively).

Settings for transferring a DB schema when enabling and disabling a transfer

Note

The default settings of the source endpoint allow you to successfully perform a transfer for most databases. Change the settings of the initial and final stages of the transfer only if it is necessary.

During a transfer, the database schema is transferred from the source to the target. The transfer is performed in two stages:

At the activation stage.

This step is performed when the transfer is activated before copying or replicating data to create a schema on the target. You can choose which parts of the schema will be migrated. By default, these are: TABLE, VIEW, PRIMARY KEY, SEQUENCE, SEQUENCE OWNED BY, RULE, TYPE, FUNCTION, and DEFAULT.
At the deactivation stage.

This step is performed at the end of the transfer operation when it is deactivated. If the transfer keeps running in replication mode, the final stage of the transfer will be performed only when replication stops. You can choose which parts of the schema will be migrated.

At the final stage, it is assumed that when the transfer is deactivated, there is no writing load on the source. You can ensure this by switching to read-only mode. At this stage, the database schema on the target is brought to a state where it will be consistent with the schema on the source.

It is recommended to include resource-intensive operations, e.g., index migration, in the final stage of migration. Migrating indexes at the beginning of the transfer can slow it down.

To transfer the schema at both the initial and final stages, one uses the pg_dump utility.

Note

When editing the settings of an endpoint of the transfer in the Replicating status, the table schemas on the target are preserved. In this case, only the table schemas that are missing on the target at the transfer restart will be transferred to the target.

Replication cannot guarantee that sequence values are preserved, so we recommend updating the sequence on the target.

Configuring the data target

Configure one of the supported data targets:

For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.

After configuring the data source and target, create and start the transfer.

Operations with the database during transfer

Tip

The PostgreSQL replication protocol does not support transferring schema changes. Avoid changing the data schema in the source and target databases during the transfer. If this cannot be avoided, perform explicit checks on the target.

For Snapshot and Snapshot and increment transfers:

In the Copying status, changes to the data schema on the source and target are not allowed.
In the Replicating status, any changes to the data schema on the source should be manually applied to the target, otherwise the transfer will not be able to continue.

For example, suppose we added a new column to the test_table table on the source:
```
ALTER TABLE test_table ADD COLUMN val2 TEXT;
```
If writing to this table continues, the transfer will not be able to insert data on the target. For replication to continue, run a similar query to change the data schema on the target:
```
ALTER TABLE test_table ADD COLUMN val2 TEXT;
```
After that, the transfer can continue.

Troubleshooting data transfer issues

Known issues when using a PostgreSQL endpoint:

For more troubleshooting tips, see Troubleshooting.

Stopping a transfer's master transaction session

Error message:

Cannot set transaction snapshot:
ERROR: invalid snapshot identifier: "<snapshot_ID>" (SQLSTATE 22023).

Possible causes:

The source is running a cron job or other application that periodically terminates sessions that are too long.
Someone manually terminated the master transaction.
The source's CPU resources are insufficient to execute a query.
The Session duration timeout parameter in the PostgreSQL cluster sets a limit for the active session lifetime.

Solution: Disable this cron job and add more CPU resources to the source; also set the Session duration timeout parameter to 0 for the duration of the transfer. After you make the changes, re-activate the transfer.

Exceeding the connection time-to-live quota

Yandex Managed Service for PostgreSQL has a connection time-to-live quota of 12 hours.

Solution: If transferring a database requires more time, [change the Session duration timeout setting of the Yandex Managed Service for PostgreSQL cluster](../../../../managed-postgresql/operations/update.md#change-postgresql-config).

VIEW transfer error

Error message:

[ERROR] "... _view": no key columns found

Can't replicate new data from Views. Under PostgreSQL — PostgreSQL transfers, only those views are transferred which are listed in the Table filter → Included tables source endpoint parameter.

Solution: manually exclude all views from the transfer, list them in the Table filter → Included tables source endpoint parameter, and then reactivate the transfer.

Error when adding a table entry by constraint

Solution: prepare the source as described in Preparing for the transfer.

Error when transferring all schema tables

Error message:

Unable to apply DDL of type 'TABLE', name '<schema>'.'<table>', error:
ERROR: schema "<schema name>" does not exist (SQLSTATE 3F000)

A transfer fails if tables of a specific schema are listed as <schema>.*. This occurs due to the specifics of pg_dump that is used for a schema transfer. When specifying tables of the entire schema in <schema>.* format in the source endpoint parameter Table filter → Included tables, the PostgreSQL types from this schema aren't extracted even if it contains tables that depend on these types.

Solution: Create PostgreSQL types in the target database manually.

Unable to create objects involving extension functions

Error message:

Unable to apply DDL of type 'TABLE', <object_name>, error:
failed to push non-row item 0 of kind "pg:DDL" in batch:
Push failed: ERROR: function <schema_name>.<function_name>() does not exist 
(SQLSTATE 42883)

In Managed Service for PostgreSQL, you cannot install an extension into a custom schema in a target database. This means the transfer will be interrupted if the user installation of Managed Service for PostgreSQL has extensions installed in a custom schema and these extensions are used in the DDLs of objects you are moving.

Solution: check the DDLs of objects whose names appear in the error. If these objects can call a function from a custom schema, manually create DDLs in the target that call functions without specifying a schema. Set the target endpoint cleanup policy to Truncate to prevent the transfer from deleting these objects.

Low transfer speed

May affect Snapshot or Snapshot and increment transfers from PostgreSQL to PostgreSQL.

Possible causes:

Write protocol.

Under normal conditions, the transfer uses the fast copy protocol but switches to slow line-by-line writes if there are batch write conflicts. The more write conflicts, the lower the transfer speed.

Solution: Set the target endpoint's cleanup policy type to Drop and exclude other writing processes.
Parallel table reading.

Parallel reading is only available for tables containing a primary key. When using a serial type key, parts of tables are read by ranges. Other types of keys allow you to distribute tables evenly according to a special algorithm.

Solution: Set up parallel copy and reactivate the transfer.

Unable to transfer child tables

Child tables are either not transferred or transferred without data if a table is partitioned.

Solution: set the following source endpoint parameters:

Select Merge inherited tables in advanced settings.
In the Included tables field, specify all child tables to transfer data from.
Make sure the user has access to the child tables.

To speed up the transfer of child tables, set up parallel copy.

Insufficient replication slots in a source database

Error message:

Warn(Activate): failed to create a replication slot "<transfer_ID>" at source:
failed to create a replication slot:
failed to create a replication slot:
ERROR: all replication slots are in use
(SQLSTATE 53400)

Solution: Increase the number of replication slots in the source database (by default, 10).

No data transfer after changing a source endpoint

After adding tables to the List of included tables in the source endpoint parameters, a data transfer restarted and stopped.

Solution:

Create tables in the target database manually.

1. Create new tables with a Primary key and without a Foreign key in the target database.
2. Add the new tables to the List of included tables in the source endpoint parameters.
3. Transfer a dump with historical data to the target database.
4. If there are errors in the logs, fix them depending on a specific error.
5. If there are no errors but the logs are empty, contact technical support or your account manager to dump goroutines. This may help resume replication without restarting the transfer.
Deactivate and reactivate the transfer.
Create a separate transfer of the Snapshot type for the new tables. Deactivating the original transfer is not required.

Transfer error when changing a master host

The error occurs in Replication or Snapshot and increment transfers due to required Write-Ahead Log (WAL) parts missing. This occurs when the Write Ahead Log (WAL) logical replication lag between the current master and a replica is greater than the allowable WAL size on other hosts, so, when switching from the master to the replica, the replication slot can't sync to the WAL on the new master.

Solution: set a limit in the additional target endpoint parameter Maximum WAL size for the replication slot and reactivate the transfer.

Error when transferring nested transactions

Transfers of PostgreSQL below version 14 don't support a transfer of tables with committed transactions that are nested more than 1024 times with replication changes at each nesting level. The degree of nesting depends on the number of nested begin; .. commit; statements.

Solution:

Use PostgreSQL 14 or higher.
Exclude transactions with a nesting level like this from a transfer.

Error transferring tables with deferred constraints

The error arises in the transfers of type Replication or Snapshot and increment because updating tables and transactions with DEFERRABLE constraints isn't supported by Data Transfer. To learn more about deferred constraints, see the PostgreSQL documentation.

Solution: Change the constraint type in such tables to IMMEDIATE and activate the transfer again.

Could not create a replication slot at the activation step

In the beginning of the transfer, one or more replication slots are created in the source database. The database objects are locked at this point. If some object is locked by another transaction, this results in a competing lock, which will terminate the transfer with an error.

Solution:

Get the PID of the process that competes for locks with the transfer:

/* Get PID of the transfer */
SELECT active_pid
  FROM pg_replication_slots
  WHERE slot_name = '<transfer_ID>';

/* search the PID of the locking process */
SELECT pid, pg_blocking_pids(pid) as blocked_by
  FROM pg_stat_activity
  WHERE cardinality(pg_blocking_pids(pid)) > 0;

        pid      | blocked_by
-----------------+-------------------
 <transfer_PID> | {<locking_transaction_PID>}
(1 row)

Look up the locking query:

SELECT query, username
  FROM pg_stat_activity
  WHERE pid = <locking_transaction_PID>;

(Optional) Stop the transaction by the command:

SELECT pg_terminate_backend(<locking_transaction_PID>);

Reactivate the transfer.

Excessive WAL size increase

In the PostgreSQL source database, the size of the Write-Ahead Log (WAL) may reach up to dozens of GB due to the queries running for over five minutes. These queries block the WAL and do not allow it to move forward and reset.

You can see the WAL size has gone up when:

The space used by the source database has increased.
The Read buffer size chart in Data Transfer monitoring is rising.

Solution:

Find the query sessions running for more than five minutes:

SELECT now()-xact_start, query, pid FROM pg_stat_activity
WHERE (now()-xact_start)>'5minute'::interval AND STATE != 'idle'
ORDER BY 1 DESC;

Terminate the sessions found. Make sure to avoid such queries moving forward.

Error when replicating data from an external source

Error message:

[XX000] ERROR: could not connect to the publisher:
SSL error: certificate verify failed FATAL:
no pg_hba.conf entry for replication connection
from host "<PostgreSQL_host_IP_address>", user "postgres", SSL off

Solution: Prepare the source as described in Preparing for the transfer.

Error when transfering tables without primary keys

Error message:

Primary key check failed: 14 Tables errors: Table no key columns found

For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.

Solution: Prepare the source as described in Preparing for the transfer.

Duplicate key violates a unique constraint

Error message:

ERROR: duplicate key value violates unique constraint "<constraint_name>" (SQLSTATE 23505)

You may get this error while replicating data from PostgreSQL to PostgreSQL, e.g., when there was not enough memory for all the transaction events and only some transaction rows were transferred from the source database to the target one. In the target database, those rows are applied in a separate transaction by default, which may lead to a constraint violation, such as a duplicate key.

Solution: use one of the following options:

Enable the advanced Save transaction borders setting for the target endpoint.Data Transfer will open the transaction, apply the received events, but will only commit the transaction when it starts receiving next transaction data.

Using the Save transaction borders setting may slightly reduce transfer performance but will prevent errors related to contraint violations.
Disable constraints in the target database. Sometimes, constraints may be violated, e.g., when part of a transaction from the source database is applied in the target database. However, Data Transfer ensures eventual consistency, i.e., applying the second part of the transaction when there are no constraint violations.

Error when dropping a table under the Drop cleanup policy

Error message:

ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)

With the Drop cleanup policy, a transfer drops tables in multiple iterations:

The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.

Solution:

If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
Delete the blocking child tables in the target database manually.
Use the Truncate cleanup policy.
Recreate the target database.

Warning

This will lead to losing all data in the database.

Error when transferring tables with generated columns

Error message:

ERROR: column "<column_name>" is a generated column (SQLSTATE 42P10)

The error may occur if a table containing generated columns is transferred from the source database. For example, if the generated column is an identifier column (GENERATED ... AS IDENTITY), the error will occur during data replication. If the generated column is a computable one, the error will occur regardless of the transfer type. For more information on generated columns, see the relevant PostgreSQL documentation.

Solution: In the source endpoint parameters, exclude from the transfer the tables containing generated columns.

Transferring data from a PostgreSQL source endpoint

Scenarios for transferring data from PostgreSQLScenarios for transferring data from PostgreSQL

Preparing the source databasePreparing the source database

Configuring the PostgreSQL source endpointConfiguring the PostgreSQL source endpoint

Managed Service for PostgreSQL clusterManaged Service for PostgreSQL cluster

Custom installationCustom installation

Additional settingsAdditional settings

Settings for transferring a DB schema when enabling and disabling a transferSettings for transferring a DB schema when enabling and disabling a transfer

Configuring the data targetConfiguring the data target

Operations with the database during transferOperations with the database during transfer

Troubleshooting data transfer issuesTroubleshooting data transfer issues

Stopping a transfer's master transaction sessionStopping a transfer's master transaction session

Exceeding the connection time-to-live quotaExceeding the connection time-to-live quota

VIEW transfer errorVIEW transfer error

Error when adding a table entry by constraintError when adding a table entry by constraint

Error when transferring all schema tablesError when transferring all schema tables

Unable to create objects involving extension functionsUnable to create objects involving extension functions

Low transfer speedLow transfer speed

Unable to transfer child tablesUnable to transfer child tables

Insufficient replication slots in a source databaseInsufficient replication slots in a source database

No data transfer after changing a source endpointNo data transfer after changing a source endpoint

Transfer error when changing a master hostTransfer error when changing a master host

Error when transferring nested transactionsError when transferring nested transactions

Error transferring tables with deferred constraintsError transferring tables with deferred constraints

Could not create a replication slot at the activation stepCould not create a replication slot at the activation step

Excessive WAL size increaseExcessive WAL size increase

Error when replicating data from an external sourceError when replicating data from an external source

Error when transfering tables without primary keysError when transfering tables without primary keys

Duplicate key violates a unique constraintDuplicate key violates a unique constraint

Error when dropping a table under the Drop cleanup policyError when dropping a table under the Drop cleanup policy

Error when transferring tables with generated columnsError when transferring tables with generated columns

Was the article helpful?

Scenarios for transferring data from PostgreSQL

Preparing the source database

Configuring the PostgreSQL source endpoint

Managed Service for PostgreSQL cluster

Custom installation

Additional settings

Settings for transferring a DB schema when enabling and disabling a transfer

Configuring the data target

Operations with the database during transfer

Troubleshooting data transfer issues

Stopping a transfer's master transaction session

Exceeding the connection time-to-live quota

VIEW transfer error

Error when adding a table entry by constraint

Error when transferring all schema tables

Unable to create objects involving extension functions

Low transfer speed

Unable to transfer child tables

Insufficient replication slots in a source database

No data transfer after changing a source endpoint

Transfer error when changing a master host

Error when transferring nested transactions

Error transferring tables with deferred constraints

Could not create a replication slot at the activation step

Excessive WAL size increase

Error when replicating data from an external source

Error when transfering tables without primary keys

Duplicate key violates a unique constraint

Error when dropping a table under the Drop cleanup policy

Error when transferring tables with generated columns