Transferring data from a PostgreSQL source endpoint
- Scenarios for transferring data from PostgreSQL
- Preparing the source database
- Configuring the PostgreSQL source endpoint
- Configuring the data target
- Operations with the database during transfer
- Troubleshooting data transfer issues
- Stopping a transfer's master transaction session
- Exceeding the connection time-to-live quota
- VIEW transfer error
- Error when adding a table entry by constraint
- Error when transferring all schema tables
- Low transfer speed
- Unable to transfer child tables
- Insufficient replication slots in a source database
- No data transfer after changing a source endpoint
- Transfer error when changing a master host
- Error when transferring nested transactions
- Error transferring tables with deferred constraints
- Could not create a replication slot at the activation step
- Excessive WAL size increase
- Error when replicating data from an external source
- Error when transfering tables without primary keys
- Error when dropping a table under the Drop cleanup policy
Yandex Data Transfer enables you to migrate data from a PostgreSQL database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Prepare the PostgreSQL database for the transfer.
- Set up an endpoint source in Yandex Data Transfer.
- Set up one of the supported data targets.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data from PostgreSQL
-
Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
-
Data change capture means tracking changes to a database and delivering those changes to consumers. It is used for applications that are sensitive to real-time data changes.
- Capturing changes from PostgreSQL and delivering to YDS.
- Capturing changes from PostgreSQL and delivering to Apache Kafka®.
-
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
-
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Preparing the source database
Note
When performing a transfer from PostgreSQL to a target of any type, objects of the large object
When transferring data of the TIMESTAMP WITHOUT TIME ZONE
type, the time zone set in the timezone
parameter of the PostgreSQL source database applies.
The column with the TIMESTAMP WITHOUT TIME ZONE
data type contains the 1970-01-01 00:00:00
value. The way the value will be transferred depends on the timezone
parameter in the database:
- If the parameter is set to
Etc/UTC
, the time will be transferred as1970-01-01 00:00:00+00
. - If the parameter is set to
Europe/Moscow
, the time will be transferred as1970-01-01 00:00:00+03
.
Data stored in a MATERIALIZED VIEW
is not transferred. To transfer MATERIALIZED VIEW
data, create an ordinary VIEW
that refers to the MATERIALIZED VIEW
to be transferred.
If the definition of the VIEW
to be transferred contains an invocation of the VOLATILE
functionVIEW
with the READ UNCOMMITTED
isolation level. No consistency between the VIEW
data and the data of other objects being transferred is guaranteed. Reading data from a MATERIALIZED VIEW
in the VIEW
definition are equivalent to invoking the VOLATILE
function.
Large objects in the TOAST storage system
-
Configure the user the transfer will use to connect to the source:
-
For the Replication and Snapshot and increment transfer types, assign the
mdb_replication
role to this user. -
Connect to the database you want to migrate as the database owner and configure privileges:
SELECT
for all the database tables to be transferred.SELECT
for all the database sequences to be transferred.USAGE
for the schemas of these tables and sequences.ALL PRIVILEGES
(CREATE
andUSAGE
) to the__consumer_keeper
and__data_transfer_mole_finder
housekeeping table schema defined by the endpoint parameter if the endpoint is going to be used for the Replication or Snapshot and increment transfer types.
-
Configure the number of user connections to the database.
-
If the replication source is a cluster, enable the
pg_tm_aux
extension for it. This will allow replication to continue even after changing the master host. In some cases, a transfer may end in an error after you replace a master in your cluster. For more information, see Troubleshooting. -
To transfer tables without primary keys for the Replication and Snapshot and increment transfer types, you must add the
REPLICA IDENTITY
:- Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
- Add the replica ID to tables without
primary keys
:-
For tables with an index, set
REPLICA IDENTITY
byunique key
:ALTER TABLE MY_TBL REPLICA IDENTITY USING INDEX MY_IDX;
-
For tables without an index, change
REPLICA IDENTITY
:ALTER TABLE MY_TBL REPLICA IDENTITY FULL;
In this case, Data Transfer will treat such tables as tables where the primary key is a composite key that includes all columns of the table.
-
If there are no primary keys in a table, logical replication will not include any changes in the rows
(UPDATE
orDELETE
).-
If, while trying to transfer tables from PostgreSQL to PostgreSQL, you do not exclude a table without primary keys on the transfer source, the following error message will be returned:
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
-
If, while trying to transfer tables from PostgreSQL to a different database, you add a table without primary keys, the following error message will be returned:
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
-
Disable the transfer of external keys when creating a source endpoint. Recreate them once the transfer is completed.
-
Find and terminate DDL queries that are running for too long. To do this, make a selection from the PostgreSQL
pg_stat_activity
housekeeping table:SELECT NOW() - query_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' ORDER BY 1 DESC;
This will return a list of queries running on the server. Pay attention to queries with a high
duration
value. -
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the PostgreSQL source.
-
To enable parallel data reads from the table, set its primary key to serial mode
.Then specify the number of jobs and threads in the transfer parameters under Runtime environment.
-
Configure WAL monitoring.
For Replication and Snapshot and increment transfers, logical replication
is used. To perform it, the transfer creates a replication slot with theslot_name
equal to the transfer ID, which you can get by selecting the transfer from the list of your transfers. Your WAL may grow due to different reasons: a long-running transaction or a transfer issue. Therefore, we recommend you to configure WAL monitoring on the source side.-
To monitor storage or disk space usage, use the monitoring tools to configure an alert (see the
disk.used_bytes
description). -
Set the maximum WAL size for replication in the
Max slot wal keep size
setting. The value of this setting can be edited as of PostgreSQL version 13. To urgently disable a transfer to perform data reads, delete the replication slot.Warning
If set to
-1
(unlimited size), you will not be able to delete WAL files due to open logical replication slots that information is not read from. As a result, the WAL files will take up the entire disk space and you will not be able to connect to the cluster. -
Set up an alert with the Yandex Monitoring tools for the metric used for
Total size of WAL files
. Make sure the threshold values are less than those specified for thedisk.used_bytes
metric because, apart from the data, the disk stores temporary files, the WAL, and other types of data. You can monitor the current slot size by making a DB request with the correctslot_name
equal to the transfer ID:SELECT slot_name, pg_size_pretty(pg_current_wal_lsn() - restart_lsn), active_pid, catalog_xmin, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'
-
-
Make sure the settings for the network hosting the cluster allow public connections from IP addresses used by Data Transfer
. -
Create a user account the transfer will use to connect to the source:
-
For the Snapshot transfer type, create a user with the following command:
CREATE ROLE <username> LOGIN ENCRYPTED PASSWORD '<password>';
-
For the Replication and Snapshot and increment transfer types, create a user with the
REPLICATION
privilege using this command:CREATE ROLE <username> WITH REPLICATION LOGIN ENCRYPTED PASSWORD '<password>';
-
-
Grant the created user the privilege to perform
SELECT
on all the database tables to be transferred and theUSAGE
privilege for the schemas of these tables:GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <username>; GRANT USAGE ON SCHEMA <schema_name> TO <username>;
-
Grant the created user the privileges to the
__consumer_keeper
and__data_transfer_mole_finder
housekeeping table schema defined by the endpoint parameter if the endpoint is going to be used for the Replication or Snapshot and increment transfer types.GRANT ALL PRIVILEGES ON SCHEMA <schema_name> TO <username>;
-
Configure the number of user connections to the database.
-
Install and enable the wal2json
extension.Installation
-
Linux
- Add the PostgreSQL official repository
for your distribution. - Update the list of available packages and install the
wal2json
package for the active version of PostgreSQL:
- Add the PostgreSQL official repository
-
Windows 10, 11
-
If you do not have Microsoft Visual Studio installed yet, download and install it. To build the wal2json extension, the Community Edition
is sufficient. During installation, select the following components:- MSBuild
- MSVC v141 x86/x64 build tools
- C++\CLI support for v141 build tools
- MSVC v141 - VS 2017 C++ x64\x86 build tools
- MSVC v141 - VS 2017 C++ x64\x86 Spectre-mitigated libs
- The latest version of the Windows SDK for the active OS version
- Other dependencies that are installed automatically for selected components
Take note of the version number of the installed Windows SDK. You will need it when you are asked to specify the wal2json build parameters.
-
Download the wal2json source code from the project page
. -
Unpack the archive with the source code to the
C:\wal2json\
folder. -
Go to
C:\wal2json
. -
In one PowerShell session, make changes to the
wal2json.vcxproj
file:-
Replace the lines
C:\postgres\pg103
with the path to the installed PostgreSQL version, for example:(Get-Content .\wal2json.vcxproj).replace('C:\postgres\pg103', 'C:\PostgreSQL\14') | ` Set-Content .\wal2json.vcxproj
-
Replace the
/MP
extension parameter to/MT
, for example:(Get-Content .\wal2json.vcxproj).replace('/MP', '/MT') | Set-Content .\wal2json.vcxproj
-
Specify the version number of the installed Windows SDK in
<WindowsTargetPlatformVersion>
:(Get-Content .\wal2json.vcxproj).replace('<WindowsTargetPlatformVersion>8.1', '<WindowsTargetPlatformVersion><installed_version_of_Windows_SDK>') | ` Set-Content .\wal2json.vcxproj
-
Enter the value of the extension variable required for building wal2json. For example, for Visual Studio Community Edition 2022:
$VCTargetsPath='C:\Program Files\Microsoft Visual Studio\2022\Comminuty\MSBuild\Microsoft\VC\v150'
-
Run the build:
& 'C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe' /p:Configuration=Release /p:Platform=x64
-
Copy
wal2json.dll
from thebuild/release
folder to thelib
folder of the installed PostgreSQL version.
-
-
Configuration
-
In the
postgresql.conf
file, change the value of thewal_level
parameter tological
:wal_level = logical
-
Restart PostgreSQL.
-
-
If the replication source is a cluster, install and enable the pg_tm_aux
extension on its hosts. This will allow replication to continue even after changing the master host. In some cases, a transfer may end in an error after you replace a master in your cluster. For more information, see Troubleshooting. -
To transfer tables without primary keys for the Replication and Snapshot and increment transfer types, you must add the
REPLICA IDENTITY
:- Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
- Add the replica ID to tables without
primary keys
:-
For tables with an index, set
REPLICA IDENTITY
byunique key
:ALTER TABLE MY_TBL REPLICA IDENTITY USING INDEX MY_IDX;
-
For tables without an index, change
REPLICA IDENTITY
:ALTER TABLE MY_TBL REPLICA IDENTITY FULL;
In this case, Data Transfer will treat such tables as tables where the primary key is a composite key that includes all columns of the table.
-
If there are no primary keys in a table, logical replication will not include any changes in the rows
(UPDATE
orDELETE
).-
If, while trying to transfer tables from PostgreSQL to PostgreSQL, you do not exclude a table without primary keys on the transfer source, the following error message will be returned:
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
-
If, while trying to transfer tables from PostgreSQL to a different database, you add a table without primary keys, the following error message will be returned:
failed to run (abstract1 source): Cannot parse logical replication message: failed to reload schema: primary key check failed: Tables: n / N check failed: "public"."MY_TBL": no key columns found
-
Disable the transfer of external keys when creating a source endpoint. Recreate them once the transfer is completed.
-
Find and terminate DDL queries that are running for too long. To do this, make a selection from the PostgreSQL
pg_stat_activity
housekeeping table:SELECT NOW() - query_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' ORDER BY 1 DESC;
This will return a list of queries running on the server. Pay attention to queries with a high
duration
value. -
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the PostgreSQL source.
-
To enable parallel data reads from the table, set its primary key to serial mode
.Then specify the number of workers and threads in the transfer parameters under Runtime environment.
-
If replication via Patroni
is configured on the source, add an ignore_slots section to the source configuration:ignore_slots: - database: <database> name: <replication_slot> plugin: wal2json type: logical
Where:
database
: Name of the database the transfer is configured for.name
: Replication slot name.
The database and the replication slot names must match the values specified in the source endpoint settings. By default, the
replication slot name
is the same as thetransfer ID
.Otherwise, the start of the replication phase will fail:
Warn(Termination): unable to create new pg source: Replication slotID <replication_slot_name> does not exist.
-
Configure WAL monitoring. For Replication and Snapshot and increment transfers, logical replication
is used. To perform it, the transfer creates a replication slot with theslot_name
equal to the transfer ID that you can get by selecting the transfer from the list of your transfers. The WAL size may increase for different reasons: due to a long-running transaction or a transfer issue. Therefore, we recommend setting up WAL monitoring on the source side.-
Set up alerts following the disk usage recommendations
. -
Set the maximum WAL size
. This feature is available starting with PostgreSQL version 13. -
You can track the current slot size by making a DB request with the correct
slot_name
equal to the transfer ID:SELECT slot_name, pg_size_pretty(pg_current_wal_lsn() - restart_lsn), active_pid, catalog_xmin, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'
-
Note
For things to note about data transfer from PostgreSQL to ClickHouse® using Replication and Snapshot and increment transfers, see Asynchronously replicating data from PostgreSQL to ClickHouse®.
Configuring the PostgreSQL source endpoint
When creating or updating an endpoint, you can define:
- Yandex Managed Service for PostgreSQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Before you get started, check the Service specifics for PostgreSQL sources and targets.
Managed Service for PostgreSQL cluster
Warning
To create or edit an endpoint of a managed database, you need to have the managed-postgresql.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for PostgreSQL cluster: Specify ID of the cluster to connect to.
-
Database: Specify the name of the database in the selected cluster.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
postgres-source
.
-
--cluster-id
: ID of the cluster you need to connect to. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
postgres_source
.
-
connection.mdb_cluster_id
: ID of cluster to connect to. -
database
— Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
postgres_source {
security_groups = ["<list_of_security_group_IDs>"]
connection {
mdb_cluster_id = "<cluster_ID>"
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
-
mdbClusterId
: ID of the cluster you need to connect to. -
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Custom installation
For OnPremise, all fields are filled in manually.
-
Host: Specify the IP address or FQDN of the master host. If the hosts are on different ports, you can specify multiple host values in the
host:port
format. If you choose this format, the value of the Port field will be disregarded. -
Port: Set the number of the port that Data Transfer will use for the connection.
-
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS
requirements. -
Subnet ID: Select or create a subnet in the required availability zone.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Database: Specify the name of the database in the selected cluster.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
postgres-source
.
-
--host
: IP address or FQDN of the master host you want to connect to. -
--port
: Number of the port that Data Transfer will use for the connection. -
--ca-certificate
— If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS . -
--subnet-id
: ID of the subnet the host resides in. -
--database
: Database name. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
postgres_source
.
-
security_groups
: Security groups for network traffic.Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.
Security groups must belong to the same network as the
subnet_id
subnet, if the latter is specified.Note
In Terraform, it is not required to specify a network for security groups.
-
on_premise.hosts
: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address. -
on_premise.port
: Port number that Data Transfer will use for connections. -
on_premise.tls_mode.enabled.ca_certificate
: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements. -
on_premise.subnet_id
: ID of the subnet where the host is located. -
database
— Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
postgres_source {
security_groups = ["<list_of_security_group_IDs>"]
connection {
on_premise {
hosts = ["<list_of_hosts>"]
port = <port_for_connection>
}
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
onPremise
: Database connection parameters:-
hosts
— IP address or FQDN of the master host to connect to. -
port
: The number of the port that Data Transfer will use for the connection. -
tlsMode
: Parameters of encryption of transmitted data if it is required, for example, to meet the requirements of PCI DSS . -
subnetId
: ID of the subnet the host resides in.
-
-
database
: Database name. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Additional settings
-
Table filter:
-
Included tables: Only data from the tables listed here will be transferred.
When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
Excluded tables: Data from the listed tables is not transferred.
The lists include the name of the schema
(description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:<schema_name>.<table_name>
: Fully qualified table name.<schema_name>.*
: All tables in the specified schema.
Table names must match this regular expression:
^"?[-_a-zA-Z0-9.]+"?\\."?[$-_a-zA-Z0-9.*]+"?$
Double quotes within a table name are not supported. Outer quotes are only used as delimiters and will be deleted when processing paths.
Warning
The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
-
-
Schema migration: If required, select the DB schema elements to transfer when activating or deactivating a transfer.
-
Advanced settings:
-
Maximum WAL size for a replication slot: Maximum size of Write-Ahead Log kept in replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB. This setting does not prevent disk overflow in the source database. You can only use it for PostgreSQL version below 13, and we recommend monitoring the WAL slot value in the source database.
Warning
The Maximum WAL size for a replication slot setting does not guarantee replication slot deletion when its threshold value is reached, particularly if there are issues with the transfer or connectivity between the transfer and the source cluster. See Preparing the source database for tips.
-
Database schema for auxiliary tables: Specify the name of the schema to store service tables (
__consumer_keeper
and__data_transfer_mole_finder
).
The schema name must match this regular expression:
^[-_a-zA-Z0-9]*$
Double quotes are not supported in schema names.
-
Merge inherited tables: Select to merge the contents of tables. For more information, see Specifics of working with endpoints.
-
Parallel table copying settings: If required, specify detailed settings for parallel copying of tables (if parallel copying parameters are set in the transfer).
-
-
--include-table
: List of included tables. Only data from the tables listed here will be transferred.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
--exclude-table
: List of excluded tables. Data from the listed tables will not be transferred.The lists include the name of the schema (description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:
<schema_name>.<table_name>
: Fully qualified table name.<schema_name>.*
: All tables in the specified schema.
Warning
The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
-
--slot-lag-limit
: Maximum size of the write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB. -
--service-schema
: Name of the DB schema for service tables. -
Schema transfer settings:
--transfer-before-data
: At the initial stage of the transfer.--transfer-after-data
: At the final stage of the transfer.
-
include_tables
: List of included tables. Only data from the tables listed here will be transferred.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
exclude_tables
: List of excluded tables. Data from the listed tables will not be transferred.The lists include the name of the schema (description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:
<schema_name>.<table_name>
: Fully qualified table name.<schema_name>.*
: All tables in the specified schema.
Warning
The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
-
slot_gigabyte_lag_limit
: Maximum size of Write-Ahead Log kept in replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB. -
service_schema
: DB schema name for housekeeping tables. -
object_transfer_settings
: Schema transfer settings:sequence
: Sequences.sequence_owned_by
: User sequences.table
: Tables.primary_key
: Primary keys.fk_constraint
: Foreign keys.default_values
: Default values.constraint
: Constraints.index
: Indexes.view
: Views.function
: Functions.trigger
: Triggers.type
: Types.rule
: Rules.collation
: Collation rules.policy
: Policies.cast
: Type casts.
You can specify one of the following values for each entity:
BEFORE_DATA
: Move when activating the transfer.AFTER_DATA
: Move when deactivating the transfer.NEVER
: Do not move.
For more information, see the Terraform provider documentation
-
includeTables
: List of included tables. Only data from the tables listed here will be transferred.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
excludeTables
: List of excluded tables. Data from the listed tables will not be transferred.The lists include the name of the schema (description of DB contents, structure, and integrity constraints) and the table name. Both lists support expressions in the following format:
<schema_name>.<table_name>
: Fully qualified table name.<schema_name>.*
: All tables in the specified schema.
Warning
The service does not transfer custom data types if the endpoint specifies a list of included or excluded tables for the source. If this is the case, please transfer your custom data types manually.
-
slotByteLagLimit
: Maximum size of the write-ahead log kept in the replication slot. If exceeded, the replication process is stopped and the replication slot is deleted. The default value is 50 GB. -
serviceSchema
: Name of the DB schema for service tables. -
objectTransferSettings
: Settings for transferring the schema at the initial and final stages of the transfer (BEFORE_DATA
andAFTER_DATA
values, respectively).
Settings for transferring a DB schema when enabling and disabling a transfer
Note
The default settings of the source endpoint allow you to successfully perform a transfer for most databases. Change the settings of the initial and final stages of the transfer only if it is necessary.
During a transfer, the database schema is transferred from the source to the target. The transfer is performed in two stages:
-
At the activation stage.
This step is performed when the transfer is activated before copying or replicating data to create a schema on the target. You can choose which parts of the schema will be migrated. By default, this is
TABLE
,VIEW
,PRIMARY KEY
,SEQUENCE
,SEQUENCE OWNED BY
,RULE
,TYPE
,FUNCTION
, andDEFAULT
. -
At the deactivation stage.
This step is performed at the end of the transfer operation when it is deactivated. If the transfer keeps running in replication mode, the final stage of the transfer will be performed only when replication stops. You can choose which parts of the schema will be migrated.
At the final stage, it is assumed that when the transfer is deactivated, there is no writing load on the source. You can ensure this by switching to
read-only
mode. At this stage, the database schema on the target is brought to a state where it will be consistent with the schema on the source.It's recommended to include resource-intensive operations like index migration in the final stage of migration. Migrating indexes at the beginning of the transfer can slow it down.
The transfer of the schema at both the initial and final stages is performed using the pg_dump
utility
Note
When editing the settings of an endpoint of the transfer in the Replicating status, the table schemas on the target are preserved. In this case, only the table schemas that are missing on the target at the transfer restart will be transferred to the target.
Replication cannot guarantee that sequence values are preserved, so we recommend updating the sequences
on the target.
Configuring the data target
Configure one of the supported data targets:
- PostgreSQL
- MySQL
- ClickHouse®
- Greenplum®
- Yandex Managed Service for YDB
- Yandex Object Storage
- Apache Kafka®
- YDS
- Elasticsearch
- OpenSearch
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
After configuring the data source and target, create and start the transfer.
Operations with the database during transfer
Tip
The PostgreSQL replication protocol does not support transferring schema changes. Avoid changing the data schema in the source and target databases during the transfer. If this cannot be avoided, perform explicit checks on the target.
For Snapshot and Snapshot and increment transfers:
-
In the Copying status, changes to the data schema on the source and target are not allowed.
-
In the Replicating status, any changes to the data schema on the source should be manually applied to the target, otherwise the transfer will not be able to continue.
For example, suppose we added a new column to the
test_table
table on the source:ALTER TABLE test_table ADD COLUMN val2 TEXT;
If writing to this table continues, the transfer will not be able to insert data on the target. For replication to continue, run a similar query to change the data schema on the target:
ALTER TABLE test_table ADD COLUMN val2 TEXT;
After that, the transfer can continue.
Troubleshooting data transfer issues
Known issues when using a PostgreSQL endpoint:
- Stopping a transfer's master transaction session.
- Exceeding the connection time-to-live quota.
- VIEW transfer error.
- Error when adding a table entry by constraint.
- Error when migrating all schema tables.
- Low transfer speed.
- Unable to transfer child tables.
- Insufficient replication slots in a source database.
- No data transfer after changing a source endpoint.
- Transfer error when changing a master host.
- Error when transferring nested transactions.
- Error transferring tables with deferred constraints.
- Cannot create a replication slot at the activation step.
- Excessive WAL size increase.
- Error when replicating from an external source.
- Error when transfering tables without primary keys.
- Error when dropping a table under the Drop cleanup policy.
See a full list of recommendations in the Troubleshooting section.
Stopping a transfer's master transaction session
Error message:
Cannot set transaction snapshot:
ERROR: invalid snapshot identifier: "<snapshot_ID>" (SQLSTATE 22023).
Possible causes include:
- The source is running a cron job or other application that periodically terminates sessions that are too long.
- Someone manually terminated the master transaction.
- The source's CPU resources are insufficient to execute a query.
Solution: Disable this cron job and add more CPU resources to the source. After you make the changes, re-activate the transfer.
Exceeding the connection time-to-live quota
Yandex Managed Service for PostgreSQL has a connection time-to-live quota of 12 hours.
Solution: If transferring a database requires more time, [change the Session duration timeout setting of the Yandex Managed Service for PostgreSQL cluster](../../../../managed-postgresql/operations/update.md#change-postgresql-config).
VIEW transfer error
Error message:
[ERROR] "... _view": no key columns found
Can't replicate new data from Views. Under PostgreSQL — PostgreSQL transfers, only those views are transferred which are listed in the Table filter → Included tables source endpoint parameter.
Solution: manually exclude all views from the transfer, list them in the Table filter → Included tables source endpoint parameter, and then reactivate the transfer.
Error when adding a table entry by constraint
Solution: prepare the source as described in Preparing for the transfer.
Error when transferring all schema tables
Error message:
Unable to apply DDL of type 'TABLE', name '<schema>'.'<table>', error:
ERROR: schema "<schema name>" does not exist (SQLSTATE 3F000)
A transfer fails if tables of a specific schema are listed as <schema>.*
. This occurs due to the specifics of pg_dump
that is used for a schema transfer. When specifying tables of the entire schema in <schema>.*
format in the source endpoint parameter Table filter → Included tables, the PostgreSQL types from this schema aren't extracted even if it contains tables that depend on these types.
Solution: Create PostgreSQL types in the target database manually.
Low transfer speed
Snapshot or Snapshot and increment transfers from PostgreSQL to PostgreSQL may slow down.
Possible cause:
-
Write protocol.
Under normal operation, a transfer is performed via the fast
copy
protocol. However, if there's a batch write conflict, it switches to slow line-by-line writes. The more write conflicts, the lower the transfer speed.Solution: set the target endpoint's cleanup policy type to
Drop
and exclude other writing processes. -
Parallel table reading.
Parallel reading is only available for tables that contain a primary key in serial mode
.Solution: Set up parallel copy and reactivate the transfer.
Unable to transfer child tables
Child tables are either not transferred or transferred without data if a table is partitioned.
Solution: set the following source endpoint parameters:
- Select Merge inherited tables in advanced settings.
- In the Included tables field, specify all child tables to transfer data from.
- Make sure the user has access to the child tables.
To speed up the transfer of child tables, set up parallel copy.
Insufficient replication slots in a source database
Error message:
Warn(Activate): failed to create a replication slot "<transfer_ID>" at source:
failed to create a replication slot:
failed to create a replication slot:
ERROR: all replication slots are in use
(SQLSTATE 53400)
Solution: Increase the number of replication slots10
).
No data transfer after changing a source endpoint
After adding tables to the List of included tables in the source endpoint parameters, a data transfer restarted and stopped.
Solution:
-
Create tables in the target database manually.
1. Create new tables with a
Primary key
and without aForeign key
in the target database.
2. Add the new tables to the List of included tables in the source endpoint parameters.
3. Transfer a dump with historical data to the target database.
4. If there are errors in the logs, fix them depending on a specific error.
5. If there are no errors but the logs are empty, contact technical support or your account manager to dump goroutines. This may help resume replication without restarting the transfer. -
Deactivate and reactivate the transfer.
-
Create a separate transfer of the Snapshot type for the new tables. Deactivating the original transfer is not required.
Transfer error when changing a master host
The error occurs in Replication or Snapshot and increment transfers due to required Write-Ahead Log (WAL) parts missing. This occurs when the Write Ahead Log (WAL) logical replication lag between the current master and a replica is greater than the allowable WAL size on other hosts, so, when switching from the master to the replica, the replication slot can't sync to the WAL on the new master.
Solution: set a limit in the additional target endpoint parameter Maximum WAL size for the replication slot and reactivate the transfer.
Error when transferring nested transactions
Transfers of PostgreSQL below version 14 don't support a transfer of tables with committed transactions that are nested more than 1024 times with replication changes at each nesting level. The degree of nesting depends on the number of nested begin; .. commit;
statements.
Solution:
- Use PostgreSQL 14 or higher.
- Exclude transactions with a nesting level like this from a transfer.
Error transferring tables with deferred constraints
The error arises in the transfers of type Replication or Snapshot and increment because updating tables and transactions with DEFERRABLE
constraints isn't supported by Data Transfer. To learn more about deferred constraints, see the PostgreSQL documentation
Solution: Change the constraint type in such tables to IMMEDIATE
and activate the transfer again.
Could not create a replication slot at the activation step
In the beginning of the transfer, one or more replication slots
Solution:
-
Get the
PID
of the process that competes for locks with the transfer:/* Get PID of the transfer */ SELECT active_pid FROM pg_replication_slots WHERE slot_name = '<transfer_ID>'; /* search the PID of the locking process */ SELECT pid, pg_blocking_pids(pid) as blocked_by FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;
pid | blocked_by -----------------+------------------- <transfer_PID> | {<locking_transaction_PID>} (1 row)
-
Look up the locking query:
SELECT query, username FROM pg_stat_activity WHERE pid = <locking_transaction_PID>;
-
(Optional) Stop the transaction by the command:
SELECT pg_terminate_backend(<locking_transaction_PID>);
Excessive WAL size increase
In the PostgreSQL source database, the size of the Write-Ahead Log (WAL) may reach up to dozens of GB due to the queries running for over five minutes. These queries block the WAL and do not allow it to move forward and reset.
You can see the WAL size has gone up when:
- The space used by the source database has increased.
- The Read buffer size chart in Data Transfer monitoring is rising.
Solution:
-
Find the query sessions running for more than five minutes:
SELECT now()-xact_start, query, pid FROM pg_stat_activity WHERE (now()-xact_start)>'5minute'::interval AND STATE != 'idle' ORDER BY 1 DESC;
-
Terminate the sessions found. Make sure to avoid such queries moving forward.
Error when replicating data from an external source
Error message:
[XX000] ERROR: could not connect to the publisher:
SSL error: certificate verify failed FATAL:
no pg_hba.conf entry for replication connection
from host "<PostgreSQL_host_IP_address>", user "postgres", SSL off
Solution: Prepare the source as described in Preparing for the transfer.
Error when transfering tables without primary keys
Error message:
Primary key check failed: 14 Tables errors: Table no key columns found
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.
Solution: Prepare the source as described in Preparing for the transfer.
Error when dropping a table under the Drop cleanup policy
Error message:
ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)
With the Drop
cleanup policy, a transfer drops tables in multiple iterations:
-
The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
-
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
-
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.
Solution:
-
If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
-
Delete the blocking child tables in the target database manually.
-
Use the
Truncate
cleanup policy. -
Recreate the target database.
Warning
This will lead to losing all data in the database.