Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • About Yandex DataSphere
    • DataSphere resource relationships
    • Communities
    • Cost management
    • Project
    • Computing resource configurations
      • Nodes and aliases
      • Health checks and monitoring
      • Node metrics
    • Foundation models
    • Quotas and limits
    • Special terms for educational institutions
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes
  1. Concepts
  2. DataSphere Inference
  3. Health checks and monitoring

Health checks and monitoring

Written by
Yandex Cloud
Updated at March 4, 2024

You can enable health checks for your node instances: the balancer will send check requests to endpoints at certain intervals and wait for a response for a certain period of time.

Checks can be implemented using HTTP or gRPC. The protocol must match the check implementation inside the node container.

The following health check settings are supported:

  • Timeout: Response waiting time.
  • Interval: Time interval between health check requests.
  • Resource health indicators: Successful or failed result thresholds. If a threshold is exceeded, the check passed or failed, respectively.
  • HTTP health check settings:
    • Path in the URI of request to the endpoint.
  • Settings of gRPC health checks:
    • Name of the service checked.

MonitoringMonitoring

Nodes supply monitoring metrics to the Yandex Monitoring service directory specified in the node settings. By default, the platform collects the following metrics:

  • For nodes:

    • node_requests: Frequency of requests to node, requests per second.
    • node_grpc_codes: Frequency of response codes for gRPC endpoints, codes per second for each code.
    • node_http_codes: Frequency of response codes for HTTP endpoints, codes per second for each code.
    • node_requests_durations: Request execution time histogram, in milliseconds.
  • For aliases:

    • alias_requests: Frequency of requests to an alias, requests per second.
    • alias_grpc_codes: Frequency of response codes for gRPC endpoints, codes per second for each code.
    • alias_http_codes: Frequency of response codes for HTTP endpoints, codes per second for each code.
    • alias_requests_durations: Request execution time histogram, in milliseconds.

Node and alias metrics contain additional labels:

  • node_id: Node ID
  • node_path: Path in the URI of request to the endpoint
  • alias_name: Alias name

You can get standard metrics using requests in Monitoring or from the DataSphere service dashboards on the node and alias pages.

Additionally, for nodes, you can enable export of any metrics to Monitoring. The platform will poll all node instances over HTTP and collect custom metrics every now and then. The charts will also be available in the Monitoring directory specified in the node settings.

The following settings are supported for collecting monitoring metrics:

  • Format: Prometheus text format or Monitoring format
  • HTTP path: GET request path
  • Port: Container port for HTTP requests

The following labels are automatically added to all metrics:

  • node_id: Node ID
  • instance_id: Node instance ID

Was the article helpful?

Previous
Nodes and aliases
Next
Node metrics
© 2025 Direct Cursus Technology L.L.C.