Node metric reference
This section describes metrics delivered to Monitoring. In addition to the metrics described, nodes can supply metrics to Monitoring specified by the user at the node creation stage.
The described metrics reflect the resource state of services deployed in DataSphere nodes.
The name of the metric is written in the name label.
All DataSphere metrics share the service=datasphere label.
System metrics
System metrics are supplied with the Yandex Cloud proxy and describe requests to Monitoring.
All system metrics have the node_path label: node endpoint.
Node system metrics
All node system metrics have the node_id label: node ID.
| Metric name Type, units |
Description Labels |
|---|---|
node_requestsRATE, requests/s |
Frequency of requests to the node. |
node_grpc_codesRATE, requests/s |
Frequency of requests to the node by gRPC response codes.code label: gRPC response code. |
node_http_codesRATE, requests/s |
Frequency of requests to the node by HTTP response codes.code label: HTTP response code. |
node_request_durationsRATE, seconds |
Response time distribution histogram for requests to the node. |
System metrics of aliases
All alias system metrics are labeled alias_name: alias name.
| Metric name Type, units |
Description Labels |
|---|---|
alias_requestsRATE, requests/s |
Frequency of requests to the node. |
alias_grpc_codesRATE, requests/s |
Frequency of requests to an alias by gRPC response codes.code label: gRPC response code. |
alias_http_codesRATE, requests/s |
Frequency of requests to an alias by HTTP response codes.code label: HTTP response code. |
alias_request_durationsRATE, seconds |
Response time distribution histogram. |
Triton metrics
For more information on Triton metrics, see the manufacturer documentation
Inference metrics
Common labels for all inference metrics:
| Label | Data |
|---|---|
| model | Model name. |
| version | Model version. |
| Metric name Type, units |
Description |
|---|---|
nv_inference_request_successRATE, requests/s |
Frequency of successful inference requests. |
nv_inference_request_failureRATE, requests/s |
Frequency of failed inference requests. |
nv_inference_countRATE, requests/s |
Frequency of inferencing. |
nv_inference_exec_countRATE, requests/s |
Frequency of calculations for inferences. |
nv_inference_pending_request_countDGAUGE, requests |
Number of pending inference requests. |
Latency metrics
Common labels for all latency metrics:
| Label | Data |
|---|---|
| model | Model name. |
| version | Model version. |
| Metric name Type, units |
Description |
|---|---|
nv_inference_request_duration_usRATE, ms |
Average duration of an inference request. |
nv_inference_queue_duration_usRATE, ms |
Average waiting time in a queue to perform inference. |
nv_inference_compute_input_duration_usRATE, ms |
Average processing time of input data for an inference. |
nv_inference_compute_infer_duration_usRATE, ms |
Average duration of computation for an inference. |
nv_inference_compute_output_duration_usRATE, ms |
Average processing time of output data for an inference. |
Summary metrics
| Metric name Type, units |
Description Labels |
|---|---|
nv_inference_request_summary_usRATE, microseconds |
Total time to process inference requests from beginning to end (including cached requests). |
nv_inference_queue_summary_usRATE, microseconds |
Total time requests spent in the execution queue (includes cached requests). |
nv_inference_compute_input_summary_usRATE, microseconds |
Total time to process input data for inference requests (in the framework backend, does not include cached requests). |
nv_inference_compute_infer_summary_usRATE, microseconds |
Total runtime of inference model for requests (in the framework backend, does not include cached requests). |
nv_inference_compute_output_summary_usRATE, microseconds |
Total time to process output data for inference requests (in the framework backend, does not include cached requests). |
GPU metrics
| Metric name Type, units |
Description Labels |
|---|---|
nv_gpu_power_usageDGAUGE, watts |
Instant GPU power consumption. |
nv_gpu_power_limitDGAUGE, watts |
Maximum GPU power limit. |
nv_energy_consumptionDGAUGE, joules |
GPU power consumption since Triton launch. |
nv_gpu_utilizationDGAUGE |
GPU usage level ([0.0 - 1.0]). |
nv_gpu_memory_total_bytesDGAUGE, bytes |
Total GPU memory size. |
nv_gpu_memory_used_bytesDGAUGE, bytes |
Used GPU memory size. |
CPU metrics
| Metric name Type, units |
Description |
|---|---|
nv_cpu_utilizationDGAUGE |
CPU load level ([0.0 - 1.0]). |
nv_cpu_memory_total_bytesDGAUGE, bytes |
Total CPU memory size. |
nv_cpu_memory_used_bytesDGAUGE, bytes |
CPU memory size in use. |
Pinned memory metrics
| Metric name Type, units |
Description Labels |
|---|---|
nv_pinned_memory_pool_total_bytesDGAUGE, bytes |
Total pinned memory size for all models. |
nv_pinned_memory_pool_used_bytesDGAUGE, bytes |
Used pinned memory size for all models. |
Response cache metrics
| Metric name Type, units |
Description |
|---|---|
nv_cache_num_hits_per_modelCOUNTER, number |
Number of cached responses for each model. |
nv_cache_num_misses_per_modelCOUNTER, number |
Number of missed responses in cache for each model. |
nv_cache_hit_duration_per_modelGAUGE, microseconds |
Total time spent to get a cached response from cache for each model. |
nv_cache_miss_duration_per_modelGAUGE, microseconds |
Total time spent searching and inserting responses into cache on cache failure for each model. |