PostgreSQL performance diagnostics
Performance diagnostics is a feature for collecting and visualizing performance statistics for PostgreSQL clusters. It enables analyzing SQL query performance and session activity to identify bottlenecks and streamline database operations.
Getting started
Make sure that:
- Managed PostgreSQL component is installed.
- Managed ClickHouse® component is installed and enabled.
- Monitoring component with Grafana is installed.
- You have permissions to modify the
ManagedPostgresConfigconfiguration and createPostgresqlClusterclusters.
Enabling diagnostics at the platform level
To enable performance diagnostics for all PostgreSQL clusters, run this command:
kubectl patch ManagedPostgresConfigs main --type='merge' -p='{"spec":{"settings": {"performance-diagnostics":{"enabled":true}}}}'
After performance diagnostics is enabled at the platform level, the system automatically deploys components for metric collection and storage (ClickHouse® and Grafana dashboards).
Enabling statistics collection for a cluster
After activating performance diagnostics at the platform level, enable statistics collection for each individual cluster.
To enable statistics collection for a PostgreSQL cluster:
-
Create or update the
PostgresqlClusterspecification:apiVersion: postgresql.stackland.yandex.cloud/v1
kind: PostgresqlCluster
metadata:
name: my-postgres-cluster
namespace: default
spec:
instances: 3
enablePerfDiagStatsCollect: truestorage:
size: 10Gipostgresql:
parameters:
max_connections: "100"
shared_buffers: "256MB"Where
enablePerfDiagStatsCollect: trueenables performance statistics collection for this cluster. -
Apply the configuration:
kubectl apply -f postgresql-cluster.yaml -
Wait for the cluster pods to restart. To check their status, run this command:
kubectl get pods -l postgresql=my-postgres-cluster
After the restart, a perfdiag sidecar container will run in each cluster pod, collecting statistics from PostgreSQL and sending them to ClickHouse®.
Viewing data in Grafana
To view collected performance metrics:
-
Open Grafana.
-
In the left-hand menu, select Dashboards.
-
Find and open the PostgreSQL Cluster Performance Diagnostics dashboard.
-
At the top of the dashboard, select:
- Cluster: Cluster name
- Time range: Time range for analysis
Dashboard panels
The dashboard is organized into the following sections and panels:
Sessions activity statistics
Sessions stats (for the moment of interval end) is a table displaying active PostgreSQL sessions at the end of the selected time interval. It contains the following data:
- Collection time (
collect_time) - Host and host role (
hostandhost_role) - Database and user (
databaseanduser) - Session state (
state) - Active query (
query) - Process ID (
pid) - Application and client details (
application_name,client_addr,client_hostname, andclient_port) - Wait events (
wait_event_typeandwait_event) - Timestamps (
backend_start,xact_start,query_start, andstate_change) - Transaction IDs (
backend_xidandbackend_xmin) - Backend type (
backend_type) - Blocking processes (
blocking_pids) - Query ID (
queryid)
Per-statement statistics
This section contains time-series charts and a table with SQL query statistics.
Time-series charts:
- Top queries by average execution time, in milliseconds
- Top queries by consumed CPU user time, in CPU cores
- Top queries by consumed CPU system time, in CPU cores
- Top queries by avg rows scanned
- Top queries by reads from filesystem, in bytes
- Top queries by writes to filesystem, in bytes
- Top queries by average calls per second
- Shared blocks read
- Shared blocks written
- Query-local blocks read
- Query-local blocks written
- Temp blocks read
- Temp blocks written
- Top queries by generated WALs, in bytes
Table:
Per-query statistics for period (sorted by avg CPU consumption) is a pivot table of per-query statistics for the selected period, sorted by average CPU usage. It includes the following information:
- Host role (
host_role) - User and database (
useranddatabase) - Query ID and text (
queryidandquery) - Total number of calls (
total_calls) - Total number of scanned rows (
total_rows_scanned) - Total execution time (
total_exec_time) - Average execution time (
average_exec_time) - Total CPU time (
total_cpu_time) - Average CPU time (
average_cpu_time) - Total size of filesystem operations (
total_fs_bytes) - Average filesystem read size (
average_fs_reads_bytes) - Average filesystem write size (
average_fs_writes_bytes)
Performance analysis
Identifying slow queries
To find slow-running queries:
- On the Per-Statement Statistics panel, sort queries by Average Execution Time or Total Execution Time.
- Focus on queries with high execution times.
- Analyze their execution plans using
EXPLAIN ANALYZE.
Analyzing resource usage
To identify resource-intensive queries:
- Check the CPU Usage panel for queries with high CPU consumption.
- Review the I/O Operations panel for queries with heavy I/O activity.
- Examine the WAL Generated panel for queries producing large volumes of WAL records.
Identifying blockers
To identify blocking processes:
- In the Sessions Activity Statistics panel, find sessions with a non-empty
blocking_pidsfield. - Identify which queries are causing the block.
- Optimize transaction logic or add appropriate indexes.
Disabling diagnostics
To disable statistics collection for a cluster, run this command:
kubectl patch ManagedPostgresConfigs main --type='merge' -p='{"spec":{"settings": {"performance-diagnostics":{"enabled":false}}}}'
Warning
Disabling the feature at the platform level will delete all collected metrics from ClickHouse®.