Using pg_repack in Managed Service for PostgreSQL
PostgreSQL tables and indexes may get bloated. When running transactions that update data in tables and indexes, the older data is retained in case you need to roll back the transactions. This causes tables and indexes to bloat during bulk data updates. You can use the pgstattuple extension or the pgsql-bloat-estimation
Older data versions are not deleted automatically. To free up storage space and remove bloat, you can delete the data you no longer need using the VACUUM FULL
The pg_repackpg_repack works without holding an exclusive lock on the tables. For more information, see the extension page
Installing pg_repack in a PostgreSQL cluster
-
Add the
pg_repackextension to your database. -
Assign the
mdb_adminrole to the owner of this database, if not assigned yet.You can get the owner's name with the list of databases in the cluster.
-
Set the following PostgreSQL cluster parameters to zero:
Warning
Other values for these settings may cause long-running commands to interrupt or idle transactions to terminate. If this happens,
pg_repackmay fail.
Installing the pg_repack client
To use the extension, you need a client with the same name. The client must be installed on a host that can connect to the PostgreSQL cluster.
To install the client:
-
Check the
pg_repackversion installed in the PostgreSQL cluster. -
Install
thepg_repackclient.Warning
The client and extension versions must match; otherwise, the connection will return this error:
ERROR: pg_repack failed with error: program 'pg_repack ...' does not match database library 'pg_repack ...'Depending on your operating system and connected repositories, you can install the client using a package manager, e.g.,
aptoryum. If the required client version is not available in the repositories, build it from the source files yourself.How to build the pg_repack client on Ubuntu 22.04 LTS
-
Connect the PostgreSQL Apt repository
, which contains some of the required dependencies:sudo apt install -y postgresql-common && \ sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y $(lsb_release -cs) -
Install the dependencies.
Select the
postgresql-server-dev-*package version matching the version of the PostgreSQL cluster you plan to connect to using the client. This will make your client more stable.This example command installs such a package for PostgreSQL 16 together with other packages:
sudo apt install git build-essential \ zlib1g-dev libzstd-dev liblz4-dev \ libreadline-dev \ postgresql-server-dev-16 -
Clone the
pg_repackGit repository by selecting the relevant version tag and open the local directory containing the repository.Example command for the
ver_1.4.8tag:git clone https://github.com/reorg/pg_repack.git \ --branch ver_1.4.8 --depth 1 && \ cd pg_repack -
Run the build:
makeWait for the build process to complete.
-
Place the
pg_repackexecutable file in any directory listed in thePATHenvironment variable for the current user.Example command to move the file to
/usr/local/bin:sudo install bin/pg_repack /usr/local/bin -
Check the client version:
pg_repack --versionThis should output the version matching the tag you selected earlier, e.g.:
pg_repack 1.4.8
-
Running pg_repack
Tip
Run pg_repack when the PostgreSQL cluster load is minimal: repacking database objects will put extra load on your cluster.
Cluster and host state data is available in the management console.
To run pg_repack and repack database objects:
-
Make sure the cluster has enough free storage space: at least twice the total size of the tables and indexes to repack.
During a repack,
pg_repackworks with copies of tables and indexes, which requires additional storage space. -
Run the
pg_repackclient with the parameters you need.Below are example commands to run the client with common parameter combinations. For a description of all supported parameters, see the extension page
.Tip
Add the
--dry-runparameter to dry runpg_repackand review the planned updates.This will output the list of objects to repack when running
pg_repackin regular mode.Run the command you need:
-
Command to repack the specified database tables:
Repacking with connection without SSLRepacking with connection via SSLpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -t <table_name>PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -t <table_name>If you need to repack multiple tables, provide the required number of
-tparameters, one for each table. -
Command to repack the specified database indexes:
Repacking with connection without SSLRepacking with connection via SSLpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -i <index_name>PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -i <index_name>If you need to repack multiple indexes, provide the required number of
-iparameters, one for each index. -
Command to repack all tables and indexes in the database:
Repacking with connection without SSLRepacking with connection via SSLpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name>PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name>
-
Usage example
Note
The example was tested in the following environment:
-
PostgreSQL
16cluster where: -
Yandex Cloud VM running Ubuntu 22.04 LTS where:
pg_repackclient version1.4.8built from source files is installed.- SSL connections to the cluster are enabled.
This example uses pgbench
- Creating test tables and indexes and populating the tables with test data.
- Running multiple queries to test tables, including
INSERTandDELETEqueries that update the table contents.
When a pgbench run is over, test tables and their relevant indexes become bloated as a result of a large number of queries. Also, pgbench runs the VACUUM command, which removes table and index bloat, at the beginning rather than the end of its operation. So we will use pgbench to create bloated tables and indexes to demonstrate how pg_repack works.
To test pg_repack on the pgbench test tables and indexes:
-
Add the
pgstattupleextension to thedb1database.This extension allows you to estimate table and index bloat to see how
pg_repackworks. -
Install
pgbenchon a VM:sudo apt install postgresql-contribNote
The PostgreSQL Apt repository contains the
postgresql-contribpackage.If you deleted this repository from the virtual machine after building the
pg_repackclient on Ubuntu 22.04 LTS, reconnect the repository. -
Create test tables and indexes in the
db1database by connecting to it as its owner (user1):PGSSLMODE='verify-full' \ pgbench -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 -i -s 1 db1This will create the following tables and indexes:
Table Index pgbench_accountspgbench_accounts_pkeypgbench_branchespgbench_branches_pkeypgbench_tellerspgbench_tellers_pkeypgbench_history— -
View the statistics for the
pgbench_*tables and indexes:-
Connect to the
db1database as the owner (user1). Usepsqlto connect. -
Request table statistics:
SELECT * FROM pgstattuple('pgbench_accounts'); SELECT * FROM pgstattuple('pgbench_branches'); SELECT * FROM pgstattuple('pgbench_tellers'); SELECT * FROM pgstattuple('pgbench_history'); -
Request index statistics:
SELECT * FROM pgstattuple('pgbench_accounts_pkey'); SELECT * FROM pgstattuple('pgbench_branches_pkey'); SELECT * FROM pgstattuple('pgbench_tellers_pkey');
The
dead_tuple_countcolumns must show zero for all query results. This means there are no bloated tables and indexes.Here is an example of partial output for the
pgbench_accountstable:table_len | tuple_count | ... | dead_tuple_count | ... | free_space | free_percent -----------+-------------+-...-+------------------+-...-+------------+-------------- 13434880 | 100000 | ... | 0 | ... | 188960 | 1.41 -
-
Run
pgbenchonce:PGSSLMODE='verify-full' \ pgbench -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 -c 5 -j 2 -t 1000 db1Wait for the
pgbenchrun to complete. This may take several minutes. -
Once again, view the statistics for the
pgbench_*tables and indexes:Non-zero values in the
dead_tuple_countcolumns indicate table and index bloat. For test tables and indexes created bypgbench, thepgbench_tellerstable will show the greatest bloating. -
Dry run
pg_repackto review the planned updates:PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 \ -d db1 \ --dry-runSince some indexes and tables are not bloated, the output will not include all database objects.
Output example:
INFO: repacking table "public.pgbench_accounts" INFO: repacking table "public.pgbench_branches" INFO: repacking table "public.pgbench_tellers" -
Run
pg_repackin regular mode to repack tables and indexes:PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 \ -d db1Wait for the
pg_repackrun to complete. This may take several minutes. -
Once again, view the statistics for the
pgbench_*tables and indexes:The
dead_tuple_countcolumns must show zero for all query results. This means there are no bloated tables or indexes, andpg_repackran correctly.