Test agent monitoring
Monitoring enables you to collect data (metrics) related to the target and test agent state and visualize this data using charts. You can collect metrics using the Telegraf
You can enable test agent monitoring in the settings when you create a test:
If you are setting up test parameters using a form in the interface, enable the Agent monitoring option. This enables a standard configuration of the agent monitoring that includes the following metrics:
You can also configure the YCMonitoring plugin to collect metrics from the Yandex Monitoring API and display metric charts in the Monitoring section of the load test.
Warning
To use the plugin, the service account needs the monitoring.viewer
role in the relevant folder.
Before using the plugin, make sure the test agent is running the latest version. Update the agent, if required.
Note
Yandex Monitoring metrics for different cloud services are collected at intervals ranging from 15 to 60 seconds, so the results for short-running tests may be undescriptive. We recommend using a plugin for long-running tests (10 minutes or more).
To set up the YCMonitoring plugin:
- Under Test settings, click
Chart in the YC Monitoring Metrics section. - In the Chart name field, specify the chart name.
- In the Query field, enter your query in the Monitoring query language.
- If you want, you can similarly add other charts.
To set up metric collection using the Telegraf plugin, specify the agent's monitoring configuration in the telegraf
section of the configuration file. You can set up each metric using all the options that are available for it in Telegraf.
Here is an example of the telegraf
section settings in the test configuration file:
telegraf:
enabled: true
package: yandextank.plugins.Telegraf
config:
metrics:
cpu:
percpu: true
totalcpu: false
fieldpass:
- time_user
- time_steal
- usage_idle
In the monitoring agent configuration, you can also describe the metrics that are missing in Telegraf.
Here is an example of the custom
metric description:
telegraf:
enabled: true
package: yandextank.plugins.Telegraf
config:
metrics:
custom:
diff: 1
measure: call
label: test
cmd: curl --silent 'http://localhost:6100/stat' | python3 -c 'import sys, json; j = json.load(sys.stdin); print("\n".join(rerp(c["values"]["accept"]) for c in j["charts"] if c["name"] == "localqueue_wait_time"))'
To set up metric collection using the YCMonitoring plugin, specify the agent's monitoring configuration under ycmonitoring
in the configuration file. With this plugin, you can collect monitoring metrics from the Yandex Monitoring API.
Warning
To use the plugin, the service account needs the monitoring.viewer
role in the relevant folder.
Before using the plugin, make sure the test agent is running the latest version. Update the agent, if required.
Note
Yandex Monitoring metrics for different cloud services are collected at intervals ranging from 15 to 60 seconds, so the results for short-running tests may be undescriptive. We recommend using a plugin for long-running tests (10 minutes or more).
Here is the minimum plugin configuration format:
ycmonitoring:
enabled: true
package: yandextank.plugins.YCMonitoring
panels:
<panel_name_1>:
group_name: <group_name_1>
queries:
- <query1>
- <query2>
<panel2name>:
group_name: <group_name_2>
queries:
- <query3>
Where:
panels
: Dictionary of panels for monitoring metric collection.panel_name
: Dictionary key and chart name on the monitoring panel.group_name
: Panel grouping key. The default value is the address of the API host used to collect metrics.query
: Query in the Monitoring query language.
Here is an example of the ycmonitoring
section settings in the test configuration file:
ycmonitoring:
enabled: true
package: yandextank.plugins.YCMonitoring
panels:
target_connections:
queries:
- '"network_connections.quota_utilization"{folderId="b1g7j67rou********ne", service="compute", resource_id="agent007"}'
target_cpu:
queries:
- '"cpu_usage"{folderId="b1g7j67rou********ne", service="compute", resource_id="agent007"}'
received_packets:
queries:
- '"network_received_packets"{folderId="b1g7j67rou********ne", service="compute", resource_id="agent007", resource_type="vm", interface_number="*"}'
Where:
resource_id
: Test agent name.folderId
: ID of the folder containing the test agent.
This plugin also allows you to configure additional parameters:
panels:
panel_name:
...
api_host: monitoring.api.cloud.yandex.net:443
token: LOADTESTING_YC_TOKEN
timeout: 5s
request_timeout: 10s
poll_interval: 60s
ignore_labels: ['service', 'resource_type', 'device', 'interface_number', 'source_metric', 'subcluster_name', 'shard', 'dc']
priority_labels: ['cpu_name', 'label']
Where:
api_host
: Address of the Monitoring API used to collect data. The default value ismonitoring.api.cloud.yandex.net:443
.token
: IAM token file path. The default value is taken from theLOADTESTING_YC_TOKEN
environment variable.timeout
: Plugin shutdown timeout after the load test is over. The default value is five seconds.request_timeout
: Monitoring API request timeout. The default value is ten seconds.poll_interval
: Interval between Monitoring API requests. The default value is 60 seconds.priority_labels
: List of labels to build the monitoring metric name.ignore_labels
: List of labels to ignore when building the monitoring metric name.
The last two parameters are used to generate names for metrics on the charts. Each metric name is based on the dictionary keys of the relevant request and items in the ignore_labels
and priority_labels
lists. Non-alphanumeric characters in names are replaced with hyphens.