OpenSearch cluster and host state monitoring
Data on the cluster and host state is available in the management console
Diagnostic information about cluster states is presented as graphs.
Charts are updated every 15 seconds.
Note
The most appropriate multiple units (MB, GB, and more) are automatically used in charts.
You can configure alerts in Yandex Monitoring to receive notifications about cluster failures. In Yandex Monitoring, there are two alert thresholds: Warning and Alarm. If the specified threshold is exceeded, you will receive alerts via the configured notification channels.
Cluster state monitoring
To view detailed information on the state of a Managed Service for OpenSearch cluster:
-
In the management console
, navigate to the folder page. -
Navigate to the Managed Service for OpenSearch service.
-
Click the name of your cluster and open the
Monitoring tab.The page displays the following charts:
-
Health status: Cluster health and technical condition:
0(red): Cluster is unhealthy or partially functional. At least one of the primary shards is not available. If the cluster responds to queries, search results will be incomplete.1(yellow): Cluster is functional. There is no access to at least one of the shard replicas. Search results in the cluster responses are complete, but if more shards become unavailable, the cluster performance will be disrupted.2(green): Cluster is healthy. All cluster shards are available.
-
Active shards: Number of active primary shards and the total number of active shards in the cluster.
-
Other shards: Number of inactive shards in each of the following states:
Delayed unassigned: Host assignment is delayed.Unassigned: No host is assigned.Relocating: Shards are being moved to another host.Initializing: Shards are initializing.
-
Nodes: Number of hosts with the
DATArole. -
Segments: Number of index segments per host.
-
Pending tasks: Number of enqueued tasks.
-
Indexing rate: Number of indexing operations per second, per host.
-
Search rate: Number of search queries per second, per host.
-
Note
To get started with Monitoring metrics, dashboards, or alerts, click Open in Monitoring in the top panel.
Monitoring the state of hosts
To view detailed information on the state of individual Managed Service for OpenSearch hosts:
- In the management console
, navigate to the folder page. - Navigate to the Managed Service for OpenSearch service.
- Click the name of your cluster and open the
Hosts tab. - Select the Monitoring tab.
- Select the host from the drop-down list.
This page displays the charts showing workloads of individual cluster hosts. It depends on the host type:
- Process CPU: Processor core workload generated by the JVM OpenSearch process.
- Memory usage: Amount of RAM used, in bytes.
- JVM heap: Use of JVM heap memory, in bytes.
- Disk space usage percent: Percentage of the disk space used.
- Management thread pool: Number of cluster management requests.
- Generic thread pool: Number of requests for running general operations.
- Thread pool queued: Number of enqueued requests.
- Thread pool rejected: Number of rejected requests.
- Process CPU: Processor core workload generated by the JVM OpenSearch process.
- Memory usage: Use of RAM, in bytes.
- JVM heap percent: Percentage of the JVM heap memory used.
- Disk space usage percent: Percentage of the disk space used.
- Indexing rate: Number of indexing operations per second.
- Search queries: Number of search queries per second.
- Open file descriptors: Number of open file descriptors.
- Write bytes: Disk write rate, in bytes per second.
- Read bytes: Disk read rate, in bytes per second.
- Write thread pool: Requests for indexing, deleting, or updating documents.
- Write operations: Number of write operations per second.
- Read operations: Number of read operations per second.
- Query time: Time spent to run requests.
- Thread pool queued: Number of enqueued requests.
- Thread pool rejected: Number of rejected requests.
- Indexing time: Time spent to index the documents.
- Merging time: Time spent to merge the documents.
- Is Alive: Status that shows the host is available.
- Requests Total: Total number of host requests.
- Process CPU: Processor core workload generated by the JVM OpenSearch process.
- Memory usage: Use of RAM, in bytes.
- Disk read/write bytes: Speed of disk operations, in bytes per second.
- Disk IOPS: Number of disk operations per second.
- Network Packets: Network packet exchange rate, in packets per second.
- Network bytes: Speed of network data exchange, in bytes per second.
Monitoring the state of host groups
To view detailed information on the state of a Managed Service for OpenSearch host group:
- In the management console
, navigate to the folder page. - Navigate to the Managed Service for OpenSearch service.
- Click the name of your cluster and open the
Node groups tab. - Select the Monitoring tab.
- Select the host group from the drop-down list.
This page displays the charts showing workloads of a cluster host group. The list depends on the type of hosts in the group and matches the charts shown for individual hosts.
Setting up alerts in Yandex Monitoring
-
In the management console
, select the folder with the cluster for which you want to configure alerts. -
Go to
Monitoring. -
Under Service dashboards, select:
- Managed Service for OpenSearch to configure cluster alerts.
- Managed Service for OpenSearch — Dashboards to configure alerts for hosts with the
DASHBOARDSrole. - Managed Service for OpenSearch — Data to configure alerts for hosts with the
DATArole. - Managed Service for OpenSearch — Manager to configure alerts for hosts with the
MANAGERrole.
-
In the chart you need, click
and select Create alert. -
If the chart shows multiple metrics, select the data query to generate a metric and click Continue. You can learn more about the query language in this Yandex Monitoring article.
-
Set the
AlarmandWarningthresholds to trigger the alert. -
Click Create alert.
To have other cluster health indicators monitored automatically:
- Create an alert.
- Add a status metric.
- In the alert parameters, set the alert thresholds.
Below are the recommended thresholds for some metrics:
| Metric | Designation | Formula | Alarm |
Warning |
|---|---|---|---|---|
| Cluster status | opensearch_status |
bottom_last(1) |
equal to 0 |
equal to 1 |
| Number of unassigned shards | opensearch_unassigned_shards |
top_last(1) |
greater than 0 |
|
| Number of shards being relocated | opensearch_relocating_shards |
top_last(1) |
greater than 0 |
|
| Number of initializing shards | opensearch_initializing_shards |
top_last(1) |
greater than 0 |
|
| Number of delayed unassigned shards | opensearch_delayed_unassigned_shards |
top_last(1) |
greater than 0 |
|
| JVM heap memory used | opensearch_jvm_mem_heap_used_percent |
top_last(1) |
Over 90% of host RAM | |
| Storage space used | opensearch_fs_total_used_percent |
top_last(1) |
Over 90% of the storage size | Over 85% of the storage size |
| Using the JVM long-lived object pool | opensearch_jvm_mem_heap_pressure |
top_last(1) |
Over 90% of host RAM | Over 75% of host RAM |
| Storage space used | disk.used_bytes |
— | 90% of the storage size | 80% of the storage size |
For the disk.used_bytes metric, the Alarm and Warning thresholds are only set in bytes. For example, the recommended values for a 100 GB disk are as follows:
Alarm:96636764160bytes (90%).Warning:85899345920bytes (80%).
You can view the current storage size and RAM of the hosts in the detailed information about the cluster. For a complete list of supported metrics, see this Monitoring guide.
Cluster state and status
The State of a cluster shows the health of its hosts, while the Status shows whether the cluster is started, stopped, or is at an intermediate stage.
To check the cluster state and status:
- In the management console
, navigate to the folder page. - Navigate to the Managed Service for OpenSearch service.
- Hover over the indicator in the cluster row of the Availability column.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To find out the state and status of a cluster, get information about it:
yc managed-opensearch cluster get <cluster_name_or_ID>
You will see the cluster state in the health parameter and the cluster status, in the status parameter.
You can get the cluster name and ID with the list of clusters in the folder.
-
Get an IAM token for API authentication and put it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Call the Cluster.Get method, e.g., via the following cURL
request:curl \ --request GET \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-opensearch/v1/clusters/<cluster_ID>'You can get the cluster ID with the list of clusters in the folder.
-
Check the server response to make sure your request was successful.
You will see the cluster health and status in the
healthandstatusparameters, respectively.
-
Get an IAM token for API authentication and put it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume that the repository contents reside in the
~/cloudapi/directory. -
Call the ClusterService.Get method, e.g., via the following gRPCurl
request:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/opensearch/v1/cluster_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.opensearch.v1.ClusterService.GetYou can get the cluster ID with the list of clusters in the folder.
-
Check the server response to make sure your request was successful.
You will see the cluster health and status in the
healthandstatusparameters, respectively.
Cluster states
| State | Description | Suggested actions |
|---|---|---|
| ALIVE | Cluster is operating normally. | No action is required. |
| DEGRADED | Cluster is not running at its full capacity: the state of at least one of the hosts is other than ALIVE. |
Run the diagnostics:
|
| DEAD | The cluster is down: none of its hosts are running. | Make a support request
|
| UNKNOWN | Cluster state is unknown. | Make a support request
|
Cluster statuses
| Status | Description | Suggested actions |
|---|---|---|
| CREATING | Preparing for the first start | Wait a while and get started. The time it takes to create a cluster depends on the host class. |
| RUNNING | The cluster is operating normally | No action is required. |
| STOPPING | The cluster is stopping | After a while, the cluster status will switch to STOPPED and the cluster will be disabled. No action is required. |
| STOPPED | The cluster is stopped | Start the cluster to get it running again. |
| STARTING | Starting the cluster that was stopped earlier | After a while, the cluster status will switch to RUNNING. Wait a while and get started. |
| UPDATING | Updating the cluster's configuration | Once the update is complete, the cluster will get the status it had prior to the update: RUNNING or STOPPED. |
| ERROR | Error when performing an operation with the cluster or during a maintenance window | If the cluster remains in this status for a long time, contact support |
| STATUS_UNKNOWN | The cluster is unable to determine its status | If the cluster remains in this status for a long time, contact support |