Using pg_repack in Managed Service for PostgreSQL
PostgreSQL tables and indexes may be given to bloating. During transactions that update data in tables and indexes, the old data is kept to allow for rollback if needed. This causes tables and indexes to bloat during bulk data updates. You can use the pgstattuple extension or the pgsql-bloat-estimation
The system does not automatically purge old data versions. To free up storage space and eliminate the bloat, you can delete the data you no longer need using the VACUUM FULL
The pg_repackpg_repack does not require an exclusive table lock, unlike other methods. For more information, see the extension page
Installing pg_repack in a PostgreSQL cluster
-
Enable the
pg_repackextension in your database. -
Assign the
mdb_adminrole to the owner of this database, if not assigned yet.You can get the owner's name from the cluster’s database list.
-
Set the following PostgreSQL cluster settings to zero:
Warning
Assigning other values to these settings may cause termination of long-running commands and idle transactions. This may result in
pg_repackfailure.
Installing the pg_repack client
To manage the extension you need the pg_repack client. You need to install the client on a host with connectivity to the PostgreSQL cluster.
To install the client:
-
Check the
pg_repackversion installed in the PostgreSQL cluster. -
Install
thepg_repackclient.Warning
The client and extension versions must match; otherwise, the connection will terminate with this error:
ERROR: pg_repack failed with error: program 'pg_repack ...' does not match database library 'pg_repack ...'Depending on your operating system and enabled repositories, you can install the client via a package manager, e.g.,
aptoryum. If the required client version is missing from your repositories, you can build it from source.How to build the pg_repack client on Ubuntu 22.04 LTS
-
Add the PostgreSQL Apt repository
, which contains some of the required dependencies:sudo apt install -y postgresql-common && \ sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y $(lsb_release -cs) -
Install the dependencies.
Select the
postgresql-server-dev-*package version matching the version of your target PostgreSQL cluster for the client connection. This will improve the stability of the compiled client.Example command installing this package for PostgreSQL 16 alongside other packages:
sudo apt install git build-essential \ zlib1g-dev libzstd-dev liblz4-dev \ libreadline-dev \ postgresql-server-dev-16 -
Clone the
pg_repackGit repository using the relevant version tag , then navigate to your local repository directory.Example command using the
ver_1.4.8tag:git clone https://github.com/reorg/pg_repack.git \ --branch ver_1.4.8 --depth 1 && \ cd pg_repack -
Build the project:
makeWait for the build to complete.
-
Place the
pg_repackbinary in any directory listed in the current user’sPATHenvironment variable.Example command for moving the file to
/usr/local/bin:sudo install bin/pg_repack /usr/local/bin -
Check the client version:
pg_repack --versionYou should see the client version matching your previously selected tag, e.g.:
pg_repack 1.4.8
-
Running pg_repack
Tip
Run pg_repack when the PostgreSQL cluster load is minimal: repacking database objects will put extra load on your cluster.
Cluster and host status data is available in the management console.
To run pg_repack and repack database objects:
-
Make sure the cluster has enough free storage space: at least twice the total size of the tables and indexes to repack.
During a repack,
pg_repackworks with copies of tables and indexes, which requires additional storage space. -
Run the
pg_repackclient with your preferred options.Here are example commands with frequently used option combinations for the client. For all supported options, see the extension page
.Tip
Add the
--dry-runflag to dry-runpg_repackin order to review the planned updates.You will see the list of objects that will be repacked when
pg_repackis run in regular mode.Run one of the following commands:
-
Repacking the specified database tables:
Repacking via a non-SSL connectionRepacking via an SSL connectionpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -t <table_name>PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -t <table_name>To repack multiple tables, provide one
-targument per table. -
Repacking the specified database indexes:
Repacking via a non-SSL connectionRepacking via an SSL connectionpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -i <index_name>PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -i <index_name>To repack multiple indexes, provide one
-iargument per index. -
Repacking all database tables and indexes:
Repacking via a non-SSL connectionRepacking via an SSL connectionpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name>PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name>
-
Usage example
Note
This example was tested in the following environment:
-
PostgreSQL
16cluster containing the following resources: -
Yandex Cloud VM running Ubuntu 22.04 LTS, with the following configuration:
- Installed
pg_repackclient1.4.8built from source. - Enabled SSL connections to the cluster.
- Installed
This example uses pgbench
- Creating test tables and indexes, and populating tables with test data.
- Running multiple queries against test tables, including
INSERTandDELETEqueries.
After pgbench finishes, the test tables and indexes remain bloated from the high query load during its run, and because pgbench executes the bloat-removing VACUUM command at the beginning rather than the end of its operation. In our example, to demonstrate the operation of pg_repack, we will first create bloated tables and indexes by running pgbench.
To test pg_repack on the test tables and indexes used by pgbench, do the following:
-
Enable the
pgstattupleextension in thedb1database.This extension will let us assess the table and index bloat, allowing us to see the results of
pg_repackoperation. -
Install
pgbenchon the VM:sudo apt install postgresql-contribNote
postgresql-contribpackage is available in the PostgreSQL Apt repository.If you deleted this repository from the virtual machine after building the
pg_repackclient on Ubuntu 22.04 LTS, please add it again. -
Connect to the database
db1as its owneruser1and create test tables and indexes:PGSSLMODE='verify-full' \ pgbench -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 -i -s 1 db1This command will create the following tables and indexes:
Table Index pgbench_accountspgbench_accounts_pkeypgbench_branchespgbench_branches_pkeypgbench_tellerspgbench_tellers_pkeypgbench_history— -
Check the statistics for the
pgbench_*tables and indexes:-
Connect to the database
db1as its owneruser1. Usepsqlto connect. -
Query the table statistics:
SELECT * FROM pgstattuple('pgbench_accounts'); SELECT * FROM pgstattuple('pgbench_branches'); SELECT * FROM pgstattuple('pgbench_tellers'); SELECT * FROM pgstattuple('pgbench_history'); -
Query the index statistics:
SELECT * FROM pgstattuple('pgbench_accounts_pkey'); SELECT * FROM pgstattuple('pgbench_branches_pkey'); SELECT * FROM pgstattuple('pgbench_tellers_pkey');
The
dead_tuple_countcolumns should contain zeros for all query results, indicating that there are no bloated tables and indexes.Partial output example for the
pgbench_accountstable:table_len | tuple_count | ... | dead_tuple_count | ... | free_space | free_percent -----------+-------------+-...-+------------------+-...-+------------+-------------- 13434880 | 100000 | ... | 0 | ... | 188960 | 1.41 -
-
Run
pgbenchonce:PGSSLMODE='verify-full' \ pgbench -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 -c 5 -j 2 -t 1000 db1Wait for
pgbenchto finish. This may take several minutes. -
Check the
pgbench_*table and index statistics once more.Non-zero values in the
dead_tuple_countcolumns indicate table and index bloat. Among the test tables and indexes created bypgbench, thepgbench_tellerstable will show the greatest bloat. -
Dry run
pg_repackto review the planned updates:PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 \ -d db1 \ --dry-runSince some indexes and tables are not bloated, the output will not include all database objects.
Output example:
INFO: repacking table "public.pgbench_accounts" INFO: repacking table "public.pgbench_branches" INFO: repacking table "public.pgbench_tellers" -
Run
pg_repackin regular mode to repack tables and indexes:PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 \ -d db1Wait for
pg_repackto finish. This may take several minutes. -
Check the
pgbench_*table and index statistics once more.The
dead_tuple_countcolumns should contain zeros for all query results. This means thatpg_repackhas run correctly, debloating tables and indexes.