Resolving the consequences of cluster storage overflow with WAL logs
Issue description
Your Managed Service for PostgreSQL cluster encountered one or multiple issues listed below:
- A Data Transfer caused the database storage overflow.
- The number of replication slots for the cluster is exceeded.
- New data is not written to your cluster's database tables.
Solution
Errors may occur because of issues with replication slots when Data Transfer performs operations with your cluster databases.
Each running transfer creates a replication slot in the database. A replication slot points to a position in the WAL log. If the slot is not in use, the position in the log does not change.
If data from this slot has not been read for some reason, the size of the WAL log continues to increase until the free space in the cluster storage runs out, or the Max slot wal keep size value is exceeded.
You can find out which replication slots are open in the database by connecting to one of the cluster databases using psql and running this query: SELECT * FROM pg_replication_slots WHERE slot_type = 'logical';.
Example of the SELECT * FROM pg_replication_slots WHERE slot_type = 'logical'; command output
```sql
SELECT * FROM pg_replication_slots WHERE slot_type = 'logical';
-[ RECORD 1 ]-------+---------------------
slot_name | dtt*****************
plugin | wal2json
slot_type | logical
```
-
If the
pluginfield in the logical replication slot has thewal2jsonvalue, it means that Data Transfer is currently working with the databases of this cluster. -
If an error occurred during the transfer process for one or more replication slots, delete the slot by using the
SELECT pg_drop_replication_slot('$REPLICATION_SLOT_NAME');command, where$REPLICATION_SLOT_NAMEis the name of the stuck replication slot. In the example above, it isdtt*****************. -
If after deleting the replication slot you see the
replication slot "$REPLICATION_SLOT_NAME" is active for PID $PID_NUMerror, try stopping the transfer on the Data Transfer side or delete the involved endpoint from the transfer parameters.
Example of the select pg_drop_replication_slot('$REPLICATION_SLOT_NAME'); command output
db-NAME=> SELECT pg_drop_replication_slot('$REPLICATION_SLOT_NAME');
ERROR: replication slot "$REPLICATION_SLOT_NAME" IS active FOR PID 12345
You can also specify the maximum size of the WAL log ,after reaching which Data Transfer will stop. To do this, edit the Max slot wal keep size value. If the size of your database's WAL log exceeds this value, the transfer will stop. This will prevent filling up all free space in the cluster storage.
After an error (or successful completion of the transfer), the WAL log is deleted, and the storage space it occupied is freed up. If the transfer does not successfully complete after changing Max slot wal keep size, increase the cluster storage following this guide.
You can learn more about the operation of replication slots in the PostgreSQL developer guide