Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • About Yandex DataSphere
    • DataSphere resource relationships
    • Communities
    • Cost management
    • Project
    • Computing resource configurations
      • Nodes and aliases
      • Health checks and monitoring
      • Node metrics
    • Foundation models
    • Quotas and limits
    • Special terms for educational institutions
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • System metrics
  • Node system metrics
  • System metrics of aliases
  • Triton metrics
  • Inference metrics
  • Latency metrics
  • Summary metrics
  • GPU metrics
  • CPU metrics
  • Pinned memory metrics
  • Response cache metrics
  1. Concepts
  2. DataSphere Inference
  3. Node metrics

Node metric reference

Written by
Yandex Cloud
Updated at October 11, 2024
  • System metrics
    • Node system metrics
    • System metrics of aliases
  • Triton metrics
    • Inference metrics
    • Latency metrics
    • Summary metrics
    • GPU metrics
    • CPU metrics
    • Pinned memory metrics
    • Response cache metrics

This section describes metrics delivered to Monitoring. In addition to the metrics described, nodes can supply metrics to Monitoring specified by the user at the node creation stage.

The described metrics reflect the resource state of services deployed in DataSphere nodes.

The name of the metric is written in the name label.

All DataSphere metrics share the service=datasphere label.

System metricsSystem metrics

System metrics are supplied with the Yandex Cloud proxy and describe requests to Monitoring.

All system metrics have the node_path label: node endpoint.

Node system metricsNode system metrics

All node system metrics have the node_id label: node ID.

Metric name
Type, units
Description
Labels
node_requests
RATE, requests/s
Frequency of requests to the node.
node_grpc_codes
RATE, requests/s
Frequency of requests to the node by gRPC response codes.
code label: gRPC response code.
node_http_codes
RATE, requests/s
Frequency of requests to the node by HTTP response codes.
code label: HTTP response code.
node_request_durations
RATE, seconds
Response time distribution histogram for requests to the node.

System metrics of aliasesSystem metrics of aliases

All alias system metrics are labeled alias_name: alias name.

Metric name
Type, units
Description
Labels
alias_requests
RATE, requests/s
Frequency of requests to the node.
alias_grpc_codes
RATE, requests/s
Frequency of requests to an alias by gRPC response codes.
code label: gRPC response code.
alias_http_codes
RATE, requests/s
Frequency of requests to an alias by HTTP response codes.
code label: HTTP response code.
alias_request_durations
RATE, seconds
Response time distribution histogram.

Triton metricsTriton metrics

For more information on Triton metrics, see the manufacturer documentation.

Inference metricsInference metrics

Common labels for all inference metrics:

Label Data
model Model name.
version Model version.
Metric name
Type, units
Description
nv_inference_request_success
RATE, requests/s
Frequency of successful inference requests.
nv_inference_request_failure
RATE, requests/s
Frequency of failed inference requests.
nv_inference_count
RATE, requests/s
Frequency of inferencing.
nv_inference_exec_count
RATE, requests/s
Frequency of calculations for inferences.
nv_inference_pending_request_count
DGAUGE, requests
Number of pending inference requests.

Latency metricsLatency metrics

Common labels for all latency metrics:

Label Data
model Model name.
version Model version.
Metric name
Type, units
Description
nv_inference_request_duration_us
RATE, ms
Average duration of an inference request.
nv_inference_queue_duration_us
RATE, ms
Average waiting time in a queue to perform inference.
nv_inference_compute_input_duration_us
RATE, ms
Average processing time of input data for an inference.
nv_inference_compute_infer_duration_us
RATE, ms
Average duration of computation for an inference.
nv_inference_compute_output_duration_us
RATE, ms
Average processing time of output data for an inference.

Summary metricsSummary metrics

Metric name
Type, units
Description
Labels
nv_inference_request_summary_us
RATE, microseconds
Total time to process inference requests from beginning to end (including cached requests).
nv_inference_queue_summary_us
RATE, microseconds
Total time requests spent in the execution queue (includes cached requests).
nv_inference_compute_input_summary_us
RATE, microseconds
Total time to process input data for inference requests (in the framework backend, does not include cached requests).
nv_inference_compute_infer_summary_us
RATE, microseconds
Total runtime of inference model for requests (in the framework backend, does not include cached requests).
nv_inference_compute_output_summary_us
RATE, microseconds
Total time to process output data for inference requests (in the framework backend, does not include cached requests).

GPU metricsGPU metrics

Metric name
Type, units
Description
Labels
nv_gpu_power_usage
DGAUGE, watts
Instant GPU power consumption.
nv_gpu_power_limit
DGAUGE, watts
Maximum GPU power limit.
nv_energy_consumption
DGAUGE, joules
GPU power consumption since Triton launch.
nv_gpu_utilization
DGAUGE
GPU usage level ([0.0 - 1.0]).
nv_gpu_memory_total_bytes
DGAUGE, bytes
Total GPU memory size.
nv_gpu_memory_used_bytes
DGAUGE, bytes
Used GPU memory size.

CPU metricsCPU metrics

Metric name
Type, units
Description
nv_cpu_utilization
DGAUGE
CPU load level ([0.0 - 1.0]).
nv_cpu_memory_total_bytes
DGAUGE, bytes
Total CPU memory size.
nv_cpu_memory_used_bytes
DGAUGE, bytes
CPU memory size in use.

Pinned memory metricsPinned memory metrics

Metric name
Type, units
Description
Labels
nv_pinned_memory_pool_total_bytes
DGAUGE, bytes
Total pinned memory size for all models.
nv_pinned_memory_pool_used_bytes
DGAUGE, bytes
Used pinned memory size for all models.

Response cache metricsResponse cache metrics

Metric name
Type, units
Description
nv_cache_num_hits_per_model
COUNTER, number
Number of cached responses for each model.
nv_cache_num_misses_per_model
COUNTER, number
Number of missed responses in cache for each model.
nv_cache_hit_duration_per_model
GAUGE, microseconds
Total time spent to get a cached response from cache for each model.
nv_cache_miss_duration_per_model
GAUGE, microseconds
Total time spent searching and inserting responses into cache on cache failure for each model.

Was the article helpful?

Previous
Health checks and monitoring
Next
Jobs
© 2025 Direct Cursus Technology L.L.C.