Monitoring the state of a Elasticsearch cluster and hosts

Written by

Updated at March 6, 2025

Monitoring cluster state
Monitoring the state of hosts
Alert settings in Yandex Monitoring
Cluster state and status
- Cluster states
- Cluster statuses

Warning

Yandex Managed Service for Elasticsearch is unavailable as of April 11, 2024.

You can create an OpenSearch cluster in Yandex Cloud as an alternative to Elasticsearch.

Data on cluster and host states are available in the management console. You can view them on the Monitoring tab of the cluster management page or in Yandex Monitoring.

Diagnostic information about cluster states is presented as graphs.

New data for charts is received every 15 seconds.

Note

The most appropriate multiple units (MB, GB, and more) are automatically used in charts.

You can configure alerts in Yandex Monitoring to receive notifications about cluster failures. In Yandex Monitoring, there are two alert thresholds: Warning and Alarm. If the specified threshold is exceeded, you will receive alerts via the configured notification channels.

Monitoring cluster state

To view detailed information about the Managed Service for Elasticsearch cluster state:

Management console

In the management console, go to the folder page and select Managed Service for Elasticsearch.
Click the cluster name and open the Monitoring tab.
To get started with Yandex Monitoring metrics, dashboards, or alerts, click Open in Monitoring in the top panel.

The page displays the following charts:

Active shards: Number of active primary shards and the total number of active shards in the cluster.
Deletion rate: Number of delete operations per second, per host.
Disk space usage percent: Shows how much disk space is used on each host (in %).
Flushes: Number of transaction log flush operations per host.
Health status: Cluster health and technical condition:
- 0 (red): Cluster is unhealthy or partially functional. At least one of the primary shards is unavailable. If the cluster responds to queries, the search results will be incomplete.
- 1 (yellow): Cluster is functional. There is no access to at least one of the replica shards. The search results in the cluster responses are complete; however, if there are more unavailable shards, cluster performance will be disrupted.
- 2 (green): Cluster is healthy. All cluster shards are available.
Indexing rate: Number of indexing operations per second, per host.
JVM heap: The use of JVM heap memory per host (in bytes).
JVM heap pressure: The use of a pool of long-lived JVM objects per host (%).
JVM old collections: Number of garbage collection cycles in the pool of long-lived JVM objects per host.
JVM young collections: Number of garbage collection cycles in the pool of new JVM objects per host.
Merges: Number of index segment merges per host.
Nodes: Number of hosts with the Data node role and total number of hosts in the cluster.
Open file descriptors: Number of open file descriptors per host.
Other shards: Number of inactive shards in each of the following states:
- Delayed unassigned: Host assignment is delayed.
- Unassigned: There is no assigned host.
- Relocating: Moving to another host.
- Initializing: Initializing.
Process CPU: Usage of processor cores on each host due to the JVM Elasticsearch process.
Query cache: Number of queries in the cache per host.
Read bytes: Disk read rate on each host (bytes per second).
Read operations: Number of read operations per second, per host.
Refreshes: Number of index segment refresh operations per host.
Search queries: Number of search queries per second per host.
Segments: Number of index segments per host.
Store size: The size of index storage on disk (in bytes).
Write bytes: Disk write rate on each host (bytes per second).
Write operations: Number of write operations per second, per host.

Note

To get started with Monitoring metrics, dashboards, or alerts, click Open in Monitoring in the top panel.

Monitoring the state of hosts

To view detailed information about the state of individual Managed Service for Elasticsearch hosts:

Management console

In the management console, go to the folder page and select Managed Service for Elasticsearch.
Click the cluster name and open the Hosts tab.
Select the Monitoring tab.
Select the host from the drop-down list.

This page displays charts showing the load on an individual host in the cluster:

CPU: Load on processor cores. As the load goes up, the Idle value goes down.
Disk bytes: Speed of disk operations (bytes per second).
Disk IOPS: Number of disk operations per second.
Memory: Use of RAM, in bytes. At high loads, the Free value goes down, while the other values go up.
Network bytes: Speed of data exchange over the network, in bytes per second.
Network packets: Number of packets exchanged over the network, per second.

Alert settings in Yandex Monitoring

Management console

In the management console, select the folder with the cluster you want to configure alerts for.
In the list of services, select Monitoring.
Under Service dashboards, select:
- Managed Service for Elasticsearch to configure cluster alerts.
- Managed Service for Elasticsearch — Host Overview to configure host alerts.
In the chart you need, click and select Create alert.
If the chart shows multiple metrics, select a data query to generate a metric and click Continue. You can learn more about the query language in the Yandex Monitoring documentation.
Set the Alarm and Warning threshold values to trigger the alert.
Click Create alert.

To have other cluster health indicators monitored automatically:

Management console

Create an alert.
Add a status metric.
In the alert parameters, set up your alert thresholds.

The recommended thresholds are as follows:

Metrica	Parameter	Formula	`Alarm`	`Warning`
Cluster status	`elasticsearch_status`	`bottom_last(1)`	`equal to 0`	`equal to 1`
Number of unassigned shards	`elasticsearch_unassigned_shards`	`top_last(1)`	`greater than 0`
Number of relocated shards	`elasticsearch_relocating_shards`	`top_last(1)`	`greater than 0`
Number of initialized shards	`elasticsearch_initializing_shards`	`top_last(1)`	`greater than 0`
Number of delayed assignment shards	`elasticsearch_delayed_unassigned_shards`	`top_last(1)`	`greater than 0`
JVM heap memory used	`elasticsearch_jvm_mem_heap_used_percent`	`top_last(1)`	Over 90% of host RAM
Storage space used	`elasticsearch_fs_total_used_percent`	`top_last(1)`	Over 90% of the storage size	Over 85% of the storage size
Using the JVM long-lived object pool	`elasticsearch_jvm_mem_heap_pressure`	`top_last(1)`	Over 90% of host RAM	Over 75% of host RAM
Storage space used	`disk.used_bytes`	—	90% of the storage size	80% of the storage size

For the disk.used_bytes metric, the Alarm and Warning thresholds are only set in bytes. For example, the recommended values for a 100 GB disk are as follows:

Alarm: 96,636,764,160 bytes (90%)
Warning: 85,899,345,920 bytes (80%)

You can view the current storage size and RAM of the hosts in the detailed information about the cluster.

Cluster state and status

The State of a cluster shows the health of its hosts, while the Status shows whether the cluster is started, stopped, or is at an intermediate stage.

To view a cluster's state and status:

Management console

API

In the management console, go to the folder page and select Managed Service for Elasticsearch.
Hover over the indicator in the Availability column in the required cluster row.

Use the get REST API method for the Cluster resource or the ClusterService/Get gRPC API call, and provide the cluster ID in the clusterId request parameter.

The cluster health and status will be shown in the health and status parameters, respectively.

You can get the cluster ID with a list of clusters in the folder.

Cluster states

State	Description	Suggested actions
ALIVE	Cluster is operating normally.	No action is required.
DEGRADED	Cluster is not running at its full capacity: the state of at least one of the hosts is other than `ALIVE`.	Run the diagnostics: Go to the Hosts tab and see which hosts are not working. Go to the Operations tab and make sure all operations are completed. Make sure the cluster is not under maintenance. If you cannot find the cause yourself, contact support.
DEAD	The cluster is down: none of its hosts are running.	Make a support request stating the following: Cluster ID. IDs of the last operations performed on it. Time the cluster entered the `DEAD` state according to the availability charts.
UNKNOWN	Cluster state is unknown.	Make a support request stating the following: Cluster ID. IDs of the last operations performed on it. Time the cluster entered the `UNKNOWN` state according to the availability charts.

Cluster statuses

Status	Description	Suggested actions
CREATING	Preparing for the first launch	Wait a while and get started. The time it takes to create a cluster depends on the host class.
RUNNING	Cluster is operating normally	No action is required.
STOPPING	Stopping cluster	After a while, the cluster status will change to `STOPPED` and the cluster will be disabled. No action is required.
STOPPED	Cluster stopped	Start the cluster to get it running again.
STARTING	Starting the cluster that was stopped earlier	After a while, the cluster status will change to `RUNNING`. Wait a while and get started.
UPDATING	Updating the cluster status	After the update is completed, the cluster status will change to `RUNNING`. Wait a while and get started.
ERROR	An error occurred that does not allow the cluster to continue working	Run the initial diagnostics: Analyze the cluster monitoring charts and view the operations performed. Prepare a list of IDs of problem resources. If you cannot find the cause of the error yourself, contact support.
STATUS_UNKNOWN	Cluster is unable to determine its own status	Run the initial diagnostics: Analyze the cluster monitoring charts and view the operations performed. Prepare a list of IDs of problem resources. If you cannot find the cause of the error yourself, contact support.

Monitoring the state of a Elasticsearch cluster and hosts

Monitoring cluster stateMonitoring cluster state

Monitoring the state of hostsMonitoring the state of hosts

Alert settings in Yandex MonitoringAlert settings in Yandex Monitoring

Cluster state and statusCluster state and status

Cluster statesCluster states

Cluster statusesCluster statuses

Was the article helpful?

Monitoring cluster state

Monitoring the state of hosts

Alert settings in Yandex Monitoring

Cluster state and status

Cluster states

Cluster statuses