Migrating a database from a third-party PostgreSQL cluster to Managed Service for PostgreSQL
- Transferring data using Yandex Data Transfer
- Migrating data using logical replication
- Getting started
- Set up the source cluster
- Export the database schema in the source cluster
- Restore the database schema in the target cluster
- Create a publication and subscription
- Transfer PostgreSQL sequences after replication
- Delete the subscription and switch over the load
- Delete the resources you created
- Transferring data by creating and restoring a logical dump
There are three ways to migrate data from a third-party source cluster to a Managed Service for PostgreSQL target cluster:
-
Transferring data using Yandex Data Transfer.
This method allows you to:
- Go without creating an intermediate VM or granting online access to your Managed Service for PostgreSQL target cluster.
- Migrate the database completely without interrupting user service.
- Migrate from older PostgreSQL versions to newer ones, including upgrading your cluster from PostgreSQL version 15 to 16.
To use this method, allow connecting to the source cluster from the internet.
For more information, see What tasks Yandex Data Transfer is used for.
-
Migrating data using logical replication.
Logical replication
uses the subscriptions mechanism. It allows you to migrate data to the target cluster with minimal downtime.Use this method only if, for some reason, it is not possible to migrate data using Yandex Data Transfer.
-
Transferring data by creating and restoring a logical dump.
A logical dump is a file with a set of commands running which one by one you can restore the state of a database. It is created using the
pg_dump
utility. To achieve a full logical dump, before you create it, switch the source cluster toread-only
.Use this method only if, for some reason, it is not possible to transfer data using any of the above methods.
Transferring data using Yandex Data Transfer
-
Prepare the infrastructure:
ManuallyTerraform-
Create a Managed Service for PostgreSQL target cluster in any suitable configuration. In which case:
- The PostgreSQL version must be the same or higher than in the source cluster. You cannot perform migration while downgrading PostgreSQL version.
- When creating a cluster, specify the same database name as in the source cluster.
- Enable the same PostgreSQL extensions as in the source cluster.
-
Create a source endpoint with the following parameters:
- Database type:
PostgreSQL
. - Endpoint parameters → Connection settings:
Custom installation
.
Specify the parameters for connecting to the source cluster.
- Database type:
-
Create a target endpoint with the following parameters:
- Database type:
PostgreSQL
. - Endpoint parameters → Connection settings:
Managed Service for PostgreSQL cluster
.
Specify the ID of the target cluster.
- Database type:
-
Create a transfer of the Snapshot and increment type that will use the created endpoints.
-
Warning
Abstain from making any changes to the data schema in the source and target clusters when the data transfer is running. For more information, see Working with databases during transfer.
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the data-transfer-pgsql-mpg.tf
configuration file to the same working directory.This file describes:
- Network.
- Subnet.
- Security group and the rule required to connect to a cluster.
- Managed Service for PostgreSQL target cluster.
- Source endpoint.
- Target endpoint.
- Transfer.
-
Specify the following in the
data-transfer-pgsql-mpg.tf
file:-
pg-extensions
: List of PostgreSQL extensions in the source cluster. -
Target cluster parameters also used as target endpoint parameters:
target_pgsql_version
: PostgreSQL version. Must be the same or higher than in the source cluster.target_user
andtarget_password
: Name and user password of the database owner.
-
Check that the Terraform configuration files are correct using this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
.Once created, your transfer will be activated automatically.
-
-
-
Wait for the transfer status to change to Replicating.
-
Switch the source cluster to read-only.
-
On the transfer monitoring page, wait for the Maximum data transfer delay metric to decrease to zero. This means that all changes that occurred in the source cluster after data copying was completed are transferred to the target cluster.
-
Deactivate the transfer and wait for its status to change to Stopped.
For more information about transfer statuses, see Transfer lifecycle.
-
Transfer the load to the target cluster.
-
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
Manually created resourcesResources created with Terraform- Delete the Managed Service for PostgreSQL cluster.
- Delete the stopped transfer.
- Delete endpoints for both the source and target.
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy
-
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-
Migrating data using logical replication
Logical replication is supported as of PostgreSQL version 10. Besides migrating data between the same PostgreSQL versions, logical replication allows you to migrate to newer PostgreSQL versions.
In Managed Service for PostgreSQL clusters, subscriptions can be used by the database owner (a user created together with the cluster) and users with the mdb_admin
role for the cluster.
Migration stages:
- Set up the source cluster.
- Export the database schema in the source cluster.
- Restore the database schema in the target cluster.
- Create a PostgreSQL publication and subscription.
- Migrate the PostgreSQL sequence after replication.
- Disable replication and transfer the load.
If you no longer need the resources you created, delete them.
Getting started
Create the required resources:
Create a Managed Service for PostgreSQL target cluster in any suitable configuration. In which case:
- The PostgreSQL version must be the same or higher than in the source cluster. You cannot perform migration while downgrading PostgreSQL version.
- When creating a cluster, specify the same database name as in the source cluster.
- Enable the same PostgreSQL extensions as in the source cluster.
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the data-migration-pgsql-mpg.tf
configuration file to the same working directory.This file describes:
- Network.
- Subnet.
- Security group and the rule required to connect to a cluster.
- Managed Service for PostgreSQL cluster with public internet access.
-
Specify the following in the
data-migration-pgsql-mpg.tf
file:-
source_db_name
: Database name. -
pg-extensions
: List of PostgreSQL extensions in the source cluster. -
Target cluster parameters:
target_pgsql_version
: PostgreSQL version. Must be the same or higher than in the source cluster.target_user
andtarget_password
: Name and user password of the database owner.
-
-
Check that the Terraform configuration files are correct using this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Set up the source cluster
-
Specify the required SSL and WAL settings in the
postgresql.conf
file. In Debian and Ubuntu, the default path to this file is/etc/postgresql/<PostgreSQL_version>/main/postgresql.conf
.-
We recommend using SSL for migrating data: this will help not only encrypt data, but also compress it. For more information, see SSL Support
and Database Connection Control Functions in the PostgreSQL documentation.To enable SSL, set the appropriate value in the configuration:
ssl = on # on, off
-
Change the logging level for Write Ahead Log (WAL)
to add the information needed for logical replication. To do this, set the wal_level value tological
.You can change this setting in
postgresql.conf
. Find the line with thewal_level
setting, uncomment it as needed, and set it tological
:wal_level = logical # minimal, replica, or logical
-
-
Configure authentication of hosts in the source cluster. To do this, add the Yandex Cloud cluster hosts to the
pg_hba.conf
file (in Debian and Ubuntu distributions, its default path is/etc/postgresql/<PostgreSQL_version>/main/pg_hba.conf
).Add lines to allow connecting to the database from the specified hosts:
-
If you use SSL:
hostssl all all <host_address> md5 hostssl replication all <host_address> md5
-
If you do not use SSL:
host all all <host_address> md5 host replication all <host_address> md5
-
-
If a firewall is enabled in the source cluster, allow incoming connections from the Managed Service for PostgreSQL cluster hosts. For example, for Ubuntu 18:
sudo ufw allow from <target_cluster_host_address> to any port <port>
-
Restart the PostgreSQL service to apply all your settings:
sudo systemctl restart postgresql
-
Check the PostgreSQL status after restarting:
sudo systemctl status postgresql
Export the database schema in the source cluster
Use the pg_dump
utility to create a file with the database schema to apply in the target cluster.
pg_dump -h <IP_address_or_FQDN_for_master_host_of_source_cluster> \
-U <username> \
-p <port> \
--schema-only \
--no-privileges \
--no-subscriptions \
-d <DB_name> \
-Fd -f /tmp/db_dump
This export command skips all data associated with privileges and roles to avoid conflicts with the database settings in Yandex Cloud. If your database requires additional users, create them.
Restore the database schema in the target cluster
Use the pg_restore
utility to restore the database schema in the target cluster:
pg_restore -h <IP_address_or_FQDN_for_master_host_of_target_cluster> \
-U <username> \
-p 6432 \
-Fd -v \
--single-transaction \
-s --no-privileges \
-d <DB_name> /tmp/db_dump
Create a publication and subscription
For logical replication to work, create a publication (a group of logically replicated tables) in the source cluster and a subscription (a description of connection to another database) on the target cluster.
-
On the source cluster, create a publication for all the database tables. When migrating multiple databases, you need to create a separate publication for each of them.
Note
You need superuser rights to create publications to all tables, but not to transfer the selected tables. For more information about creating publications, see the PostgreSQL documentation
.Query:
CREATE PUBLICATION p_data_migration FOR ALL TABLES;
-
On the Managed Service for PostgreSQL cluster host, create a subscription with the publication connection string. For more information about creating subscriptions, see the PostgreSQL documentation
.Request with SSL enabled:
CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source_cluster_address> port=<port> user=<username> sslmode=verify-full dbname=<DB_name>' PUBLICATION p_data_migration;
Without SSL:
CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source_cluster_address> port=<port> user=<username> sslmode=disable dbname=<DB_name>' PUBLICATION p_data_migration;
-
To get the replication status, check the
pg_subscription_rel
folders. You can get the general replication status viapg_stat_subscription
for the target cluster, and viapg_stat_replication
for the source cluster.SELECT * FROM pg_subscription_rel;
First of all, check the
srsubstate
field. There,r
means the synchronization is complete and the databases are ready for replication.
Transfer PostgreSQL sequences after replication
To complete synchronization of the source cluster and the target cluster:
-
Switch the source cluster to
read-only
. -
Create a dump with PostgreSQL-sequences in the source cluster:
pg_dump -h <IP_address_or_FQDN_for_master_host_of_source_cluster> \ -U <username> \ -p <port> \ -d <DB_name> \ --data-only -t '*.*_seq' > /tmp/seq-data.sql
Pay attention to the
*.*_seq
pattern used. If the database you are migrating has sequences that do not match this pattern, enter a different pattern to export them.For more information about patterns, see the PostgreSQL documentation
. -
Restore the dump with sequences in the target cluster:
psql -h <IP_address_or_FQDN_for_master_host_of_target_cluster> \ -U <username> \ -p 6432 \ -d <DB_name> \ < /tmp/seq-data.sql
Delete the subscription and switch over the load
-
Delete the subscription in the target cluster:
DROP SUBSCRIPTION s_data_migration;
-
Transfer the load to the target cluster.
Delete the resources you created
Delete the resources you no longer need to avoid paying for them:
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy
-
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-
Transferring data by creating and restoring a logical dump
Use pg_dump
to create a dump of the database in the source cluster. To restore the dump in the target cluster, use pg_restore
.
Note
This may require the pg_repack
database extension.
Migration stages:
- Create a dump of the database you want to migrate.
- (Optional) Create a virtual machine in Yandex Cloud and upload the DB dump to it.
- Restore data from the dump to the target cluster.
If you no longer need the resources you created, delete them.
Getting started
Create the required resources:
-
Create a Managed Service for PostgreSQL target cluster in any suitable configuration. The following parameters must be the same as in the source cluster:
-
Version: PostgreSQL.
-
Username.
Note
You may use different usernames for the source and the target. This, however, may result in an error when restoring the dump. For more information, see Moving and restoring a PostgreSQL cluster.
-
-
(Optional step) Create a VM based on Ubuntu 20.04 LTS with the following parameters:
-
Disks and file storages → Size: Sufficient to store both archived and unarchived dumps.
The recommended size is two or more times the total dump and dump archive size.
-
Network settings:
- Subnet: Select a subnet on the cloud network hosting the target cluster.
- Public IP: Select
Auto
or one address from a list of reserved IPs.
-
-
If you use security groups for the intermediate VM and the Managed Service for PostgreSQL cluster, configure them.
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the data-restore-pgsql-mpg.tf
configuration file to the same working directory.This file describes:
- Network.
- Subnet.
- Security group and the rule required to connect to a cluster.
- Managed Service for PostgreSQL cluster with public internet access.
- (Optional) Virtual machine with public internet access.
-
Specify the following in the
data-restore-pgsql-mpg.tf
file:-
pg-extensions
: List of PostgreSQL extensions in the source cluster. -
Target cluster parameters:
-
target_pgsql_version
: PostgreSQL version. Must be the same or higher than in the source cluster. -
target_db_name
: Database name. -
target_user
: Username of the database owner. It must be the same as the username in the source cluster.Note
You may use different usernames for the source and the target. This, however, may result in an error when restoring the dump. For more information, see Moving and restoring a PostgreSQL cluster.
-
target_password
: User password of the database owner.
-
-
(Optional) Virtual machine parameters:
vm_image_id
: ID of the public image with Ubuntu without GPU, e.g., for Ubuntu 20.04 LTS.vm_username
andvm_public_key
: Username and absolute path to the public key, for access to the VM. By default, the specified username is ignored in the Ubuntu 20.04 LTS image. A user with theubuntu
username is created instead. Use it to connect to the instance.
-
-
Check that the Terraform configuration files are correct using this command:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Create a database dump
-
Switch the database to
read-only
. -
Create a dump using the pg_dump
utility. To speed it up, run the utility in multithreaded mode by providing the number of available CPU cores in the--jobs
argument:pg_dump --host=<IP_address_or_FQDN_for_master_host_of_source_cluster> \ --port=<port> \ --username=<username> \ --jobs=<number_of_CPU_cores> \ --format=d \ --dbname=<DB_name> \ --file=db_dump
(Optional) Upload the dump to a virtual machine in Yandex Cloud
Transfer your data to an intermediate VM in Compute Cloud if:
- Your Managed Service for PostgreSQL cluster is not accessible from the internet.
- Your hardware or connection to the cluster in Yandex Cloud is not very reliable.
The required amount of RAM and processor cores depends on the amount of data to migrate and the required migration speed.
To prepare the virtual machine to restore the dump:
-
In the management console, create a new VM from an Ubuntu 20.04 image on Marketplace. The VM parameters depend on the size of the database you want to migrate. The minimum configuration (1 core, 2 GB RAM, 10 GB disk space) should be sufficient to migrate a database up to 1 GB in size. The larger the database, the more RAM and storage space you need for migration (at least twice the size of the database).
The VM must be in the same network and availability zone as the PostgreSQL cluster. Additionally, the VM must be assigned a public IP address so that you can load the dump from outside Yandex Cloud.
-
Set up the PostgreSQL apt repository
. -
Install the PostgreSQL client and additional utilities for working with the DBMS:
sudo apt install postgresql-client-common && \ sudo apt install postgresql-client-<PostgreSQL_version>
-
Archive the dump:
tar -cvzf db_dump.tar.gz db_dump
-
Move the archive containing the dump to the VM, e.g., using the
scp
utility:scp db_dump.tar.gz <VM_user_name>@<VM_public_address>:/db_dump.tar.gz
-
Unpack the archive with the dump:
tar -xzf db_dump.tar.gz
Restore data from the dump to the target cluster
Restore the database dump using the pg_restore
The pg_restore
version must match that of pg_dump
, and the major version must be at least as high that of the DB the dump will be deployed on.
That is, to restore a dump of PostgreSQL 10, PostgreSQL 11, PostgreSQL 12, PostgreSQL 13, or PostgreSQL 14 use pg_restore 10
, pg_restore 11
, pg_restore 12
, pg_restore 13
, or pg_restore 14
, respectively.
pg_restore --host=<IP_address_or_FQDN_for_master_host_of_target_cluster> \
--username=<username> \
--dbname=<DB_name> \
--port=6432 \
--format=d \
--verbose \
db_dump \
--single-transaction \
--no-privileges
If you only need to restore a single schema, add the --schema=<schema_name>
parameter. Without this parameter, the command will only run on behalf of the database owner.
If the restoration fails due to errors related to lack of required permissions for creating and updating extensions, remove the --single-transaction
parameter from the command. The errors will be ignored in this case:
pg_restore: warning: errors ignored on restore: 3
Make sure the errors only apply to the extensions and check the integrity of your restored data.
Delete the resources you created
Delete the resources you no longer need to avoid paying for them:
- Delete the Yandex Managed Service for PostgreSQL cluster.
- If you created an intermediate virtual machine, delete it.
- If you reserved public static IP addresses, release and delete them.
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy
-
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-