Alerting rules
Yandex Managed Service for Prometheus® allows adding alerting rules and serving notifications when these rules trigger.
To use alerting:
- Add alerting rules.
- Set up Alert Manager to process and deliver notifications.
This section describes some aspects of alerting rules and Alert Manager configuration. For file management, see Recording rules.
You can set up alerting via the management console
Pre-configuration for using the API
The API consists of REST resources available at https://monitoring.api.cloud.yandex.net/prometheus/workspaces/<workspace_ID>/extensions/v1/rules.
To start running requests:
- Install cURL
. - Authenticate with the API.
- Create a workspace and copy its ID to then use it in the request address.
Adding alerting rules
Requirements for alerting rules
In Yandex Managed Service for Prometheus®, you can use PromQL-based alerting rules
When describing the rules, consider the following:
- All YAML file specification
fields are supported. - There is support for annotation templating
through the$valueand$labelsvariables. - Iterations and functions are not supported.
Alerting rule file example
To test alerting, copy the code below into the host-cpu-usage-alert.yml file:
groups:
- name: CPU_Usage_Alerts
rules:
- alert: HighCPUUsage
expr: 100 * (1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 80
for: 5m
labels:
severity: critical
annotations:
summary: "High CPU usage detected on {{$labels.instance}}"
description: "CPU usage on instance {{$labels.instance}} has been above 80% for the last 5 minutes."
This example describes the CPU_Usage_Alerts rule with the HighCPUUsage alert: The alert goes off when CPU usage remains above 80% for more than five minutes. CPU usage percentage is calculated by the formula in the expr field.
The alert comes with the severity: critical label, which is used to route notification channels. In the Alert Manager configuration, you can map specific labels with channels so that notifications for different alerts are sent to different channels.
Adding or replacing an alerting rule file
- On the Monitoring
page, select Prometheus on the left. - Select or create a workspace.
- Navigate to the Rules tab.
- If you have not uploaded any files yet, click Add file and select a
.ymlfile with rules. - To add another file, click Add file.
- To replace the existing file, click
> Replace file to its right.
-
Create a file named
host-cpu-usage-alert.ymland encode its contents in Base64 as per RFC 4648 :# cat <<EOF > host-cpu-usage-alert.yml groups: - name: CPU_Usage_Alerts rules: - alert: HighCPUUsage expr: 100 * (1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))) > 80 for: 5m labels: severity: critical annotations: summary: "High CPU usage detected on {{$labels.instance}}" description: "CPU usage on instance {{$labels.instance}} has been above 80% for the last 5 minutes." EOF # base64 -iw0 host-cpu-usage-alert.yml Z3JvdXBzOgotIG5hbWU6I****** -
Save the result as a JSON file:
body.json
{ "name": "host-cpu-usage-alert.yml", "content" : "Z3JvdXBzOgotIG5hbWU6I******" } -
Create or replace an alerting rule file:
export IAM_TOKEN=<IAM_token> curl -X PUT \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${IAM_TOKEN}" \ -d "@body.json" \ "https://monitoring.api.cloud.yandex.net/prometheus/workspaces/<workspace_ID>/extensions/v1/rules"
Once you create the alerting rules, they will start to be computed to generate the ALERTS and ALERTS_FOR_STATE metrics. Set up Alert Manager to send notifications.
You can monitor alert statuses in the management console
For more on file operations and rule computation, see Recording rules.
Setting up Alert Manager
Alert Manager
Notification channel settings
- Notification channels from the configuration file are mapped against the Yandex Monitoring notification channels specified in the workspace folder.
- Currently, only the email
, Telegram , SMS, and push channels are supported. All other channels will be ignored without any error notifications. - The channel is selected under the routing rules
in the Alert Manager configuration, in theroutessection. In the routing rules, channels are mapped to labels specified in the alerting rules in thelabelssection, e.g.,severity: critical.
Sample configuration file
This example is configured to send Telegram, email, SMS, and push notifications.
global:
resolve_timeout: 5m
# Routing and grouping alerts
route:
# Default receiver
receiver: 'default-receiver'
routes:
# Alert notifications with the following label are sent to this channel: severity="warning"
- receiver: 'warning-receiver'
matchers:
- severity="warning"
# Alert notifications with the following label are sent to this channel: severity="critical"
- receiver: 'critical-receiver'
matchers:
- severity="critical"
receivers:
# Default receiver; this channel is used to send notifications for alerts that have no matches in the _routes_ section
- name: 'default-receiver'
yandex_monitoring_configs:
# Channels are not specified, alerts without labels are not sent
- channel_names: []
# Receiver for alerts with the severity="warning" label; instead of an email, specify the name of the channel specified in the workspace folder
- name: 'warning-receiver'
yandex_monitoring_configs:
- channel_names: [ 'email', 'push-channel' ]
# Receiver for alerts with the severity="critical" label; instead of Telegram, specify the name of the channel specified in the workspace folder
- name: 'critical-receiver'
yandex_monitoring_configs:
- channel_names: [ 'telegram', 'sms-channel' ]
For more information on setting up dynamic notification routing, see this Prometheus guide
Adding or replacing a configuration file
- On the Monitoring
page, select Prometheus on the left. - Select or create a workspace.
- Navigate to the Alert manager configuration tab.
- If you have no uploaded configuration files yet, click Upload a configuration file and select a
.ymlfile. - To download a file, click Download.
- To replace a file, click Replace file.
-
Save the configuration to the
alertmanager.ymlfile and encode it in Base64 as per RFC 4648 :# cat <<EOF > alertmanager.yml global: resolve_timeout: 5m route: receiver: 'default-receiver' routes: - receiver: 'warning-receiver' matchers: - severity="warning" - receiver: 'critical-receiver' matchers: - severity="critical" receivers: - name: 'default-receiver' yandex_monitoring_configs: - channel_names: [] - name: 'warning-receiver' yandex_monitoring_configs: - channel_names: [ 'email', 'push-channel' ] - name: 'critical-receiver' yandex_monitoring_configs: - channel_names: [ 'telegram', 'sms-channel' ] EOF # base64 -iw0 alertmanager.yml # Z2xvYmFsOgogIHJlc29sdmVfdGltZW91****** -
Save the result as a JSON file:
alertmanager-body.json
{ "content" : "Z2xvYmFsOgogIHJlc29sdmVfdGltZW91******" } -
Create or replace a configuration file:
export IAM_TOKEN=<IAM_token> curl -X PUT \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${IAM_TOKEN}" \ -d "@alertmanager-body.json" \ "https://monitoring.api.cloud.yandex.net/prometheus/workspaces/<workspace_ID>/extensions/v1/alertmanager"
If the request is successful, you will get the 204 HTTP code, if not, the error text. A file without a single match with current notification channels in the folder will not be accepted.