Managed Service for PostgreSQL performance analysis and tuning
- Getting started
- Diagnosing inefficient query execution
- Troubleshooting issues with inefficient queries
- Diagnosing resource shortages
- Troubleshooting resource shortage issues
- Detecting locks
- Troubleshooting locking issues
- Diagnosing connection errors
- Troubleshooting connection issues
- Diagnosing insufficient storage space
- Troubleshooting problems with insufficient storage space
Managed Service for PostgreSQL cluster performance drops most often due to one of the following:
- Inefficient query execution in PostgreSQL.
- High CPU load, frequent disk reads and writes, and network requests.
- Locks.
- Running out of available connections.
- Running out of free storage space.
Here are some tips for diagnosing and fixing these issues.
To detect potential issues in a cluster, use tools for comprehensive cluster state analysis.
Getting started
-
Select databases to troubleshoot.
-
Enable the Access from the management console option in the cluster.
-
Activate statistics collection about sessions and queries.
-
Enable the
auto_explain
module for extended query execution plan logging. -
If you want more queries to appear in the performance log, reduce the value of the
log_min_duration_statement
parameter in the DBMS settings.Warning
When
log_min_duration_statement
is set to0
, all queries are logged, regardless of the time they take to complete. In this case, you can run out of storage space quickly.
Diagnosing inefficient query execution
There are two ways to identify problematic queries:
-
Retrieve them from the PostgreSQL
pg_stat_activity
system table:SELECT NOW() - query_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' ORDER BY 1 DESC;
This will return a list of queries running on the server. Pay attention to queries with a high
duration
value.To learn more about the information in the output, see the PostgreSQL
documentation. -
Get and analyze query statistics using the diagnostics tool integrated into Managed Service for PostgreSQL.
Troubleshooting issues with inefficient queries
There are several ways to optimize problematic queries:
-
Analyze the query plan using the
EXPLAIN
command.Check queries that do not use indexes (a large number of rows in the
Seq Scan
nodes). Such queries increase utilization of both I/O (due to more reads from the disk) and CPU (as it takes more processor time to process large numbers of rows).Create
or update appropriate indexes.Tip
To visualize the execution plans of the available queries, use the SQL tab on the cluster management page.
For more information, see SQL queries in Yandex WebSQL.
-
Log the query execution plan automatically using the
auto_explain
module . -
Update statistics using the
ANALYZE
command.The query execution plan is based on the statistics collected by the DBMS. If the data in the DBMS is updated frequently, these statistics quickly become outdated. Use the
ANALYZE
query for the DBMS to perform a new analysis of the table or entire database:ANALYZE <table_or_database_name>;
If required, in the DBMS settings, increase the value of
default_statistics_target
, then run theANALYZE
query again.For more information about the
default_statistics_target
parameter, see the PostgreSQL settings. -
Create extended statistics objects.
PostgreSQL does not collect statistics about the correlation of data between columns of the same table. This is because the number of possible combinations of columns can be extremely large. If there is a relationship between columns, create extended statistics objects
. The scheduler will then be able to optimize queries based on information about the correlation of data in the columns. -
Analyze query execution plan statistics in PostgreSQL logs.
The PostgreSQL
auto_explain
module outputs info on the query execution plan to the PostgreSQL log. You can collect statistics by searching the lines in the log. For more information, see the PostgreSQL documentation.
If you cannot optimize the identified queries or manage without them, the only option is to change the host class to a higher one.
Diagnosing resource shortages
Resource shortage is a likely cause of cluster performance degradation. Resource shortages become evident from cluster monitoring charts (CPU, disk I/O operations, network connections). If a resource utilization chart continued to rise but then plateaued, the resource load has reached its limit or is exceeding the guaranteed service level.
In most cases, high CPU utilization and high disk I/O are due to suboptimal indexes or a large load on the hosts. Check the data on sessions and queries collected by the performance diagnostics tool.
Troubleshooting resource shortage issues
Try optimizing identified queries that consume a large amount of resources. If the load is still high or there is nothing to optimize, the only option is to upgrade the host class.
Detecting locks
Cluster performance may degrade because of locks obtained when there are multiple simultaneous attempts to access the same DB resource (table, row).
To detect locks using the performance diagnostics tool:
-
Go to the folder page and select Managed Service for PostgreSQL.
-
Click the name of the cluster you need and select the Performance diagnostics tab.
-
On the Sessions tab, in the Slice field, select WAIT_EVENT_TYPE.
- Check the Lock chart. It shows the number of queries that were in the locked state during the selected period.
- To get detailed information about queries running during the selected period, go to the Queries tab.
For more information about what statistics you can get, see the PostgreSQL
documentation.
To diagnose the presence of locks using PostgreSQL, run the query:
SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa
ON pl.pid = psa.pid;
For more information about selecting queries with locks, see the PostgreSQL
Troubleshooting locking issues
Try optimizing the identified queries. If the load is still high or there is nothing to optimize, the only option is to upgrade the host class.
Diagnosing connection errors
The number of DB connections is limited by the max_connections
parameter and is calculated by the following formula:
200 × <vCPU_share_on_host>: 15
Where <vCPU_share_on_host>
is the product of the number of vCPUs by their guaranteed share and 15
is the number of reserved service connections. The resulting number of connections is distributed between the database roles.
If the number of open connections reaches the limit, errors appear in the cluster logs:
- Too many connections for role.
- Server conn crashed.
- Invalid server parameter.
- Query wait timeout.
To get detailed information about the usage of available connections using monitoring tools:
-
Go to the folder page and select Managed Service for PostgreSQL.
-
Click the name of the cluster you need and select the Monitoring tab.
-
Check the Total pooler connections chart.
Managed Service for PostgreSQL does not allow connections directly to the DBMS; instead, they go to the connection pooler.
-
The Clients parameter reflects the number of client connections to the connection pooler.
-
The Servers parameter reflects the number of connections between the DBMS and the connection pooler.
Take note of the number of connections: if the values are high, this means some queries keep connections open for too long.
-
Troubleshooting connection issues
To fix issues with the number of connections:
-
Increase the
max_connections
value in the DBMS settings. -
Optimize queries so that there are no long transactions.
If the load is still high or there is nothing to optimize, the only option is to upgrade the host class.
Diagnosing insufficient storage space
If the cluster shows poor performance and the logs show ERROR: cannot execute INSERT in a read-only transaction
, the cluster storage might have run out of free space and switched to read-only mode.
To check for free space in cluster storage:
-
Go to the folder page and select Managed Service for PostgreSQL.
-
Click the name of the cluster you need and select the Monitoring tab.
-
Check the Disk capacity in primary, [bytes] chart.
Check the value of the Used parameter that shows the degree of cluster storage usage.
Troubleshooting problems with insufficient storage space
For recommendations on troubleshooting these issues, see Disk space management.