Using pg_repack in Managed Service for PostgreSQL
PostgreSQL tables and indexes may get bloated. When running transactions that update data in tables and indexes, the older data is retained in case you need to roll back the transactions. This causes tables and indexes to bloat during bulk data updates. You can use the pgstattuple extension or the pgsql-bloat-estimation
Older data versions are not deleted automatically. To free up storage space and remove bloat, you can delete the data you no longer need using the VACUUM FULL
The pg_repackpg_repack
works without holding an exclusive lock on the tables. For more information, see the extension page
Installing pg_repack in a PostgreSQL cluster
-
Add the
pg_repack
extension to your database. -
Assign the
mdb_admin
ormdb_superuser
role to the owner of this database, if not assigned yet.You can get the owner's name with the list of databases in the cluster.
-
Set the following PostgreSQL cluster parameters to zero:
Warning
Other values for these settings may cause long-running commands to interrupt or idle transactions to terminate. If this happens,
pg_repack
may fail.
Installing the pg_repack client
To use the extension, you need a client with the same name. The client must be installed on a host that can connect to the PostgreSQL cluster.
To install the client:
-
Check the
pg_repack
version installed in the PostgreSQL cluster. -
Install
thepg_repack
client.Warning
The client and extension versions must match; otherwise, the connection will return this error:
ERROR: pg_repack failed with error: program 'pg_repack ...' does not match database library 'pg_repack ...'
Depending on your operating system and connected repositories, you can install the client using a package manager, e.g.,
apt
oryum
. If the required client version is not available in the repositories, build it from the source files yourself.How to build the pg_repack client on Ubuntu 22.04 LTS
-
Connect the PostgreSQL Apt repository
, which contains some of the required dependencies:sudo apt install -y postgresql-common && \ sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y $(lsb_release -cs)
-
Install the dependencies.
Select the
postgresql-server-dev-*
package version matching the version of the PostgreSQL cluster you plan to connect to using the client. This will make your client more stable.This example command installs such a package for PostgreSQL 16 together with other packages:
sudo apt install git build-essential \ zlib1g-dev libzstd-dev liblz4-dev \ libreadline-dev \ postgresql-server-dev-16
-
Clone the
pg_repack
Git repository by selecting the relevant version tag and open the local directory containing the repository.Example command for the
ver_1.4.8
tag:git clone https://github.com/reorg/pg_repack.git \ --branch ver_1.4.8 --depth 1 && \ cd pg_repack
-
Run the build:
make
Wait for the build process to complete.
-
Place the
pg_repack
executable file in any directory listed in thePATH
environment variable for the current user.Example command to move the file to
/usr/local/bin
:sudo install bin/pg_repack /usr/local/bin
-
Check the client version:
pg_repack --version
This should output the version matching the tag you selected earlier, e.g.:
pg_repack 1.4.8
-
Running pg_repack
Tip
Run pg_repack
when the PostgreSQL cluster load is minimal: repacking database objects will put extra load on your cluster.
Cluster and host state data is available in the management console.
To run pg_repack
and repack database objects:
-
Make sure the cluster has enough free storage space: at least twice the total size of the tables and indexes to repack.
During a repack,
pg_repack
works with copies of tables and indexes, which requires additional storage space. -
Run the
pg_repack
client with the parameters you need.Below are example commands to run the client with common parameter combinations. For a description of all supported parameters, see the extension page
.Tip
Add the
--dry-run
parameter to dry runpg_repack
and review the planned updates.This will output the list of objects to repack when running
pg_repack
in regular mode.Run the command you need:
-
Command to repack the specified database tables:
Repacking with connection without SSLRepacking with connection via SSLpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -t <table_name>
PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -t <table_name>
If you need to repack multiple tables, provide the required number of
-t
parameters, one for each table. -
Command to repack the specified database indexes:
Repacking with connection without SSLRepacking with connection via SSLpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -i <index_name>
PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name> \ -i <index_name>
If you need to repack multiple indexes, provide the required number of
-i
parameters, one for each index. -
Command to repack all tables and indexes in the database:
Repacking with connection without SSLRepacking with connection via SSLpg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name>
PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U <username> \ -d <DB_name>
-
Usage example
Note
The example was tested in the following environment:
-
PostgreSQL
16
cluster where: -
Yandex Cloud VM running Ubuntu 22.04 LTS where:
pg_repack
client version1.4.8
built from source files is installed.- SSL connections to the cluster are enabled.
This example uses pgbench
- Creating test tables and indexes and populating the tables with test data.
- Running multiple queries to test tables, including
INSERT
andDELETE
queries that update the table contents.
When a pgbench
run is over, test tables and their relevant indexes become bloated as a result of a large number of queries. Also, pgbench
runs the VACUUM
command, which removes table and index bloat, at the beginning rather than the end of its operation. So we will use pgbench
to create bloated tables and indexes to demonstrate how pg_repack
works.
To test pg_repack
on the pgbench
test tables and indexes:
-
Add the
pgstattuple
extension to thedb1
database.This extension allows you to estimate table and index bloat to see how
pg_repack
works. -
Install
pgbench
on a VM:sudo apt install postgresql-contrib
Note
The PostgreSQL Apt repository contains the
postgresql-contrib
package.If you deleted this repository from the virtual machine after building the
pg_repack
client on Ubuntu 22.04 LTS, reconnect the repository. -
Create test tables and indexes in the
db1
database by connecting to it as its owner (user1
):PGSSLMODE='verify-full' \ pgbench -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 -i -s 1 db1
This will create the following tables and indexes:
Table Index pgbench_accounts
pgbench_accounts_pkey
pgbench_branches
pgbench_branches_pkey
pgbench_tellers
pgbench_tellers_pkey
pgbench_history
— -
View the statistics for the
pgbench_*
tables and indexes:-
Connect to the
db1
database as the owner (user1
). Usepsql
to connect. -
Request table statistics:
SELECT * FROM pgstattuple('pgbench_accounts'); SELECT * FROM pgstattuple('pgbench_branches'); SELECT * FROM pgstattuple('pgbench_tellers'); SELECT * FROM pgstattuple('pgbench_history');
-
Request index statistics:
SELECT * FROM pgstattuple('pgbench_accounts_pkey'); SELECT * FROM pgstattuple('pgbench_branches_pkey'); SELECT * FROM pgstattuple('pgbench_tellers_pkey');
The
dead_tuple_count
columns must show zero for all query results. This means there are no bloated tables and indexes.Here is an example of partial output for the
pgbench_accounts
table:table_len | tuple_count | ... | dead_tuple_count | ... | free_space | free_percent -----------+-------------+-...-+------------------+-...-+------------+-------------- 13434880 | 100000 | ... | 0 | ... | 188960 | 1.41
-
-
Run
pgbench
once:PGSSLMODE='verify-full' \ pgbench -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 -c 5 -j 2 -t 1000 db1
Wait for the
pgbench
run to complete. This may take several minutes. -
Once again, view the statistics for the
pgbench_*
tables and indexes:Non-zero values in the
dead_tuple_count
columns indicate table and index bloat. For test tables and indexes created bypgbench
, thepgbench_tellers
table will show the greatest bloating. -
Dry run
pg_repack
to review the planned updates:PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 \ -d db1 \ --dry-run
Since some indexes and tables are not bloated, the output will not include all database objects.
Output example:
INFO: repacking table "public.pgbench_accounts" INFO: repacking table "public.pgbench_branches" INFO: repacking table "public.pgbench_tellers"
-
Run
pg_repack
in regular mode to repack tables and indexes:PGSSLMODE='verify-full' \ pg_repack -k -h c-<cluster_ID>.rw.mdb.yandexcloud.net -p 6432 \ -U user1 \ -d db1
Wait for the
pg_repack
run to complete. This may take several minutes. -
Once again, view the statistics for the
pgbench_*
tables and indexes:The
dead_tuple_count
columns must show zero for all query results. This means there are no bloated tables or indexes, andpg_repack
ran correctly.