Alert
An alert is a sequence of named queries calculated once a minute. The resulting query value is compared to the preset threshold values. If a threshold is reached, Monitoring changes the alert status to Alarm
or Warning
and notifies the user via a notification channel.
Alert statuses
An alert can have one of the following statuses:
Status | Description |
---|---|
OK |
The metric value is within the specified normal threshold. |
Warning |
The metric value has reached the Warning threshold. |
Alarm |
The metric value has reached the Alarm critical status threshold. |
No data |
Lack of metric data to calculate the alert function. |
Error |
The alert value cannot be calculated. |
Alert evaluation history
Alert evaluation history is represented as a chart that consists of columns colored depending on the alert status as of its calculation.
To navigate through history, you can choose one of the preset output scales:
1h
: 1 hour1d
: 1 day1w
: 1 week1m
: 1 month
The minimum scale is 1h
. Each chart column shows the alert status for the given minute. For big output scales, the column color is made up of the statuses calculated within the interval.
By clicking a column, you bring up the alert settings information as of the selected evaluation point.
Note
When drawing data from the evaluation history, the alert status is re-evaluated and presented in the Evaluation status field. The alert status in the history may differ from the current evaluation result due to the specifics of historical data decimation or delays in data delivery to Monitoring.
Alert settings
Queries
This is a set of queries that return a line or multiple lines.
You can:
- Disable query calculation by clicking
and selecting Deactivate. Links to queries that are not calculated will return errors. - Hide query calculation results from the chart by clicking
. - Display query calculation results on the chart by clicking
.
Alert triggers
Test query
Name of query to whose calculation result an aggregation function applies.
Aggregation function
Aggregation function is applied to the test query calculation results.
Aggregation function | Description |
---|---|
At least one value | At least one metric value in the query exceeds the thresholds set in the specified period. |
All values | All metric values in the query exceed the thresholds set in the specified period. |
Average | Calculates an average value for each metric in the specified period. For example, if a query returns two metrics, Monitoring calculates an average value for each of them in the specified window. |
Count | Calculates the number of metric values in the specified period. |
Last value | Uses the latest metric value in the specified period. If Yandex Monitoring could not obtain the metric value, it changes the alert status to No data . |
Maximum | Uses the maximum metric value in the specified period. |
Minimum | Uses the minimum metric value in the specified period. |
Sum | Calculates the sum of values for each metric in the specified period. |
For example, to track the latest metric value within the last 15 minutes, select the Last function and set the evaluation window to 15m
.
Comparison function
Comparison functions are applied to aggregation function calculation results and the Warning and Alarm threshold values. If an aggregated value matches the threshold one, Monitoring changes the alert status.
Warning
Threshold value upon which the alert status changes to Warning
.
Alarm
Threshold value upon which the alert status changes to Alarm
.
Evaluation window
Time interval for which the aggregation function is calculated. The window allows to exclude sudden changes in metric values by only responding to changes over a longer period.
You can select a preset value or specify your own in the following format:
1h
: 1 hour1m
: 1 minute1s
: 1 second
For example, 3m 45s
sets an evaluation window of 3 minutes 45 seconds.
Evaluation delay
Back-shift of the time window in seconds. The default value is 0. Allows avoiding a situation when an alert is triggered unexpectedly, if a query uses metrics collected at a different interval. You can select a preset value or specify your own, same as for the evaluation window.
No data policy
Policies set the alert status when there is no data or metrics in the evaluation window for a given criterion. Policies are applied prior to verifying trigger conditions.
When there are no metrics or points in an evaluation window, you can handle this scenario in two ways:
- Change the alert status using the No metrics by selector or No points in evaluation window policies.
- Handle it manually.
Note
To make alerts switch to the No data
status when there are no metrics or points in the evaluation window, set the policy value for all alert types to No data
. Avoid using the Default
and Manual
values as they require extra manual handling.
No selector metrics
The policy determines the alert status if no metrics were found for at least one selector. For example, these metrics do not exist or they were deleted after their TTL expired.
The possible values are:
Default
:No data
for all alert types.OK
: Changes the alert status toOK
.Warn
: Changes the alert status toWarning
.Alarm
: Changes the alert status toAlarm
.No data
: Changes the alert status toNo data
.
No points in evaluation window
The policy determines the alert status if at least one of the metrics in the evaluation window has no points.
For threshold alerts that monitor several metrics, predicates are checked for each metric independently. The final alert status is an aggregation of statuses for each of the metrics in the following order: No data
< OK
< Warning
< Error
< Alarm
. If there is a line that has no points, and the No points in evaluation window
policy changes the status of a threshold alert to Warning
, while for another line, the predicate that changes the alert to Alarm
is true, the resulting alert status will be Alarm
.
The possible values are:
Default
: Default value (No data
) for all types of threshold alerts.OK
: Changes the alert status toOK
.Warning
: Changes the alert status toWarning
.Alarm
: Changes the alert status toAlarm
.No data
: Changes the alert status toNo data
.Manual
: Gives control to the predicates or alert program for manual handling.
Manual processing of no data
Setting the Manual
value for any policy will give control to the alert predicates or program.
Avoid the Manual
value as it complicates the alert program. The No data
policy value should cover most cases.