MySQL® cluster and host state monitoring
Data on the cluster and host state is available in the management console
Diagnostic information about cluster states is presented as graphs.
Chart update rate:
- Standard hosts and hosts with an increased RAM to vCPU ratio (
memory-optimized): 15 seconds. - Hosts with a guaranteed vCPU share under 100% (
burstable): 150 seconds.
Note
The most appropriate multiple units (MB, GB, and more) are automatically used in charts.
You can configure alerts in Yandex Monitoring to receive notifications about cluster failures. In Yandex Monitoring, there are two alert thresholds: Warning and Alarm. If the specified threshold is exceeded, you will receive alerts via the configured notification channels.
Monitoring the cluster state
To view detailed information on the health state of a Managed Service for MySQL® cluster:
-
Navigate to Managed Service for MySQL.
-
Click the name of your cluster and select the Monitoring tab.
The page that opens will display performance charts for the cluster.
To get started with Yandex Monitoring metrics, dashboards, or alerts, click Open in Monium in the top panel.
The following charts are displayed for a cluster:
- Queries per second: Number of user queries per second for each cluster host.
- Average query time: Average query time for each cluster host.
- Slow queries per second: Number of queries per second that are longer than specified in long_query_time. The information is displayed for each cluster host.
- Connections: Number of connections for each cluster host.
- Threads running: Number of running threads for each cluster host.
- Disk usage: Disk space usage for each host and the entire cluster.
- Is Primary, [boolean]: Shows whether the host is the master.
- Is Alive, [boolean]: Shows the availability of each cluster host.
- Replication lag: Replica lag time behind the master.
- Free space: Free disk space for each host and the entire cluster.
- OOM Count: Number of out-of-memory events on each cluster host.
The Master overview section shows the master details:
-
Query quantiles: Query execution time, percentiles.
-
Threads: Number of threads. The following is displayed separately:
-
Threads cached: Number of cached threads.
-
Threads connected: Number of open DB connections.
The maximum number of open connections is set in max_connections.
-
Threads running: Number of running threads.
-
-
Thread states: Number of threads in each state.
-
Disk usage: Disk space usage. The following is displayed separately:
- data: Amount of space used by data.
- default tablespace: Amount of space used by data in the tablespace.
- innodb logs: Amount of space used by InnoDB logs.
- relaylogs, binlogs: Amount of space used by MySQL® service logs.
- temp tablespace: Amount of space used by data in the temporary tablespace.
- undo tablespace: Amount of space used by data in the InnoDB undo tablespace
.
-
InnoDB rows operation: Number of InnoDB row operations.
-
InnoDB locks: Row lock waits in InnoDB. The following is displayed separately:
- innodb_row_lock_current_waits: Current number of row lock waits.
- innodb_row_lock_waits: Number of operations with row lock waits per second.
-
Temp tables: Number of temporary tables and files created per second.
-
Sorts and joins: Number of sort and join operations per second.
-
Table cache: Table caching. The following is displayed separately:
- open_tables: Number of open tables.
- opened_tables: Number of open tables per second.
- table_open_cache_hits: Number of open table cache hits per second.
- table_open_cache_misses: Number of open table cache misses per second.
Host state monitoring
To view detailed information on the state of individual Managed Service for MySQL® hosts:
-
Navigate to Managed Service for MySQL.
-
Click the name of your cluster and select the Monitoring tab.
-
Navigate to the Hosts tab and select the host.
The page that opens will display performance charts for the cluster hosts.
To get started with Yandex Monitoring metrics, dashboards, or alerts, click Open in Monium in the top panel.
The following charts are displayed for the hosts:
-
CPU usage: CPU usage percentage by consumption type.
-
Memory usage: RAM usage by consumption type.
-
Disk IOPS: Number of read and write operations per second.
-
Network Packets: Number of packets sent and received over the network per second.
-
Network Bytes: Network data transmit and receive rate.
For Replica hosts, the Received value is normally greater than Sent on the Network Bytes and Network Packets charts.
The MySQL overview section shows detailed information about the DBMS state on the host:
-
Query quantiles: Query execution time, percentiles.
-
Queries per second: Number of user queries per second.
-
Slow queries per second: Number of queries per second that are longer than specified in long_query_time.
-
Threads: Number of threads. The following is displayed separately:
-
Threads cached: Number of cached threads.
-
Threads connected: Number of open DB connections.
The maximum number of open connections is set in max_connections.
-
Threads running: Number of running threads.
-
-
Thread states: Number of threads in each state.
-
Handlers: Number of operation handlers.
-
Replication lag: Replica lag time behind the master.
-
File IO write bytes: Data write rate by file type.
-
File IO read bytes: Data read rate by file type.
-
Disk usage: Disk space usage. The following is displayed separately:
- data: Amount of space used by data.
- default tablespace: Amount of space used by data in the tablespace.
- innodb logs: Amount of space used by InnoDB logs.
- relaylogs, binlogs: Amount of space used by MySQL® service logs.
- temp tablespace: Amount of space used by data in the temporary tablespace.
- undo tablespace: Amount of space used by data in the InnoDB undo tablespace
.
-
File IO write operations: Number of write operations per second by file type.
-
File IO read operations: Number of read operations per second by file type.
-
Temp tables: Number of temporary tables and files created per second.
-
Sorts and joins: Number of sort and join operations per second.
-
Table cache: Table caching. The following is displayed separately:
- open_tables: Number of open tables.
- opened_tables: Number of open tables per second.
- table_open_cache_hits: Number of open table cache hits per second.
- table_open_cache_misses: Number of open table cache misses per second.
-
InnoDB rows operation: Number of InnoDB row operations.
-
InnoDB locks: Row lock waits in InnoDB. The following is displayed separately:
- innodb_row_lock_current_waits: Current number of row lock waits.
- innodb_row_lock_waits: Number of operations with row lock waits per second.
-
InnoDB lock time: Maximum InnoDB row lock wait time.
-
InnoDB cache efficiency: Efficiency of the InnoDB buffer. The following is displayed separately:
- innodb_buffer_pool_reads: Number of read operations per second where InnoDB had to access the disk.
- innodb_buffer_pool_read_requests: Total number of read operations per second.
-
InnoDB data operations: Number of InnoDB operations per second. The following is displayed separately:
- innodb data fsyncs: Number of
fsync()operations when flushing data to disk. - innodb data reads: Number of disk reads.
- innodb data writes: Number of disk writes.
- innodb data fsyncs: Number of
-
SemiSync latency: Maximum transaction commit latency in semi-synchronous replication
. -
Inode usage: Inode usage as a percentage of the total.
Under MySQL overview → Disk Metrics Details:
- Disk write latency (percentiles): Disk write latency, percentiles.
- Disk write bytes: Average and maximum disk write rate.
- Disk write operations: Average and maximum number of write operations per second.
- Disk read latency (percentiles): Disk read latency, percentiles.
- Disk read bytes: Average and maximum disk read rate.
- Disk read operations: Average and maximum number of read operations per second.
- Disk write throttler latency (percentiles): Write delay introduced by exceeding disk quota, percentiles.
- Disk read throttler latency (percentiles): Read delay introduced by exceeding disk quota, percentiles.
- Disk used quota: Average and maximum quota used percentage for disk operations.
Setting up alerts in Yandex Monitoring
-
In the management console
, select the folder with the cluster for which you want to set up alerts. -
Navigate to the
Monitoring service. -
Under Service dashboards, select:
- Managed Service for MySQL® — Cluster Overview to set up cluster alerts.
- Managed Service for MySQL® — Host Overview to set up host alerts.
-
In the chart you need, click
and select Create alert. -
If the chart displays multiple metrics, select the data query for the relevant metric and click Continue. To learn more about the query language, see this Yandex Monitoring article.
-
Set the
AlarmandWarningalert thresholds. -
Click Create alert.
To have other cluster health indicators monitored automatically:
- Create an alert.
- Add a status metric.
- In the alert parameters, set the alert thresholds.
Recommended threshold values for selected metrics:
| Metric | Internal metric name | Alarm |
Warning |
|---|---|---|---|
| Replication lag | mysql_replication_lag |
600 |
60 |
| Number of healthy hosts | mysql_is_alive |
<number_of_hosts> - 2 |
<number_of_hosts> - 1 |
| Average query execution time | mysql_latency_query_avg |
— | 2,000 |
| Storage space used | disk.used_bytes |
90% of the storage size | 80% of the storage size |
| CPU usage | cpu.idle |
10 |
20 |
The Alarm and Warning thresholds for the disk.used_bytes metric are specified exclusively in bytes. For example, the recommended values for a 100 GB disk are as follows:
Alarm:96,636,764,160bytes (90%)Warning:85,899,345,920bytes (80%)
You can check the current storage size in the cluster details. For a complete list of supported metrics, see this Monitoring article.
Cluster state and status
The State of a cluster shows the health of its hosts, while the Status shows whether the cluster is started, stopped, or is at an intermediate stage.
To view the health state and status of a cluster:
- Navigate to Managed Service for MySQL.
- In the cluster row, hover over the indicator in the Availability column.
Cluster health states
| State | Description | Suggested actions |
|---|---|---|
| ALIVE | Cluster is operating normally. | No action is required. |
| DEGRADED | Cluster is not running at its full capacity: the state of at least one of the hosts is other than ALIVE. |
Run the diagnostics:
|
| DEAD | The cluster is down: none of its hosts are running. | Make a support request
|
| UNKNOWN | Cluster state is unknown. | Make a support request
|
Cluster statuses
| Status | Description | Suggested actions |
|---|---|---|
| CREATING | Preparing for the first start | Wait a while and get started. The time it takes to create a cluster depends on the host class. |
| RUNNING | The cluster is operating normally | No action is required. |
| STOPPING | The cluster is stopping | After a while, the cluster status will switch to STOPPED and the cluster will be disabled. No action is required. |
| STOPPED | The cluster is stopped | Start the cluster to get it running again. |
| STARTING | Starting the cluster that was stopped earlier | After a while, the cluster status will switch to RUNNING. Wait a while and get started. |
| UPDATING | Updating the cluster's configuration | Once the update is complete, the cluster will get the status it had prior to the update: RUNNING or STOPPED. |
| ERROR | Error when performing an operation with the cluster or during a maintenance window | If the cluster remains in this status for a long time, contact support |
| STATUS_UNKNOWN | The cluster is unable to determine its status | If the cluster remains in this status for a long time, contact support |