Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML Services
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Monitoring
  • Getting started
  • Access management
  • Pricing policy
  • Terraform reference
    • All metric references
    • API Gateway
    • Application Load Balancer
    • Audit Trails
    • BareMetal
    • Certificate Manager
    • Cloud Backup
    • Cloud CDN
    • Cloud Functions
    • Cloud Logging
    • Compute Cloud
    • Container Registry
    • Yandex Data Processing
    • Data Transfer
    • Cloud DNS
    • Identity and Access Management
    • Yandex IoT Core
    • Key Management Service
    • Yandex Lockbox
    • Managed Service for Apache Airflow™
    • Managed Service for Apache Kafka®
    • Managed Service for Apache Spark™
    • Managed Service for ClickHouse®
    • Managed Service for Greenplum®
    • Managed Service for GitLab
    • Managed Service for Kubernetes
    • Yandex StoreDoc
    • Managed Service for MySQL®
    • Managed Service for OpenSearch
    • Managed Service for PostgreSQL
    • Managed Service for Trino
    • Yandex Managed Service for Valkey™
    • Managed Service for YDB
    • Message Queue
    • Monitoring
    • Network Load Balancer
    • Object Storage
    • Serverless Integrations
    • Serverless Containers
    • SmartCaptcha
    • Smart Web Security
    • Unified Agent
  • Release notes

In this article:

  • Cluster resource metrics
  • Service metrics
  • Driver metrics
  • Executor metrics
  1. Metric reference
  2. Managed Service for Apache Spark™

Yandex Managed Service for Apache Spark™ metrics

Written by
Yandex Cloud
Updated at September 18, 2025
  • Cluster resource metrics
  • Service metrics
    • Driver metrics
    • Executor metrics

This section describes Managed Service for Apache Spark™ metrics delivered to Monitoring.

The name label contains the metric name.

Labels shared by all Managed Service for Apache Spark™ metrics:

Label

Value

service

Service ID: managed-spark

cluster_id

Cluster ID

node_name

Host ID

node_role

Host role. The possible values are:

  • spark_cluster.driver for the driver
  • spark_cluster.executor for the executor

Cluster resource metricsCluster resource metrics

Name

Type, units

Description

node.allocatable_cpu.gauge

DGAUGE, count

Number of CPUs available to containers.

node.capacity_cpu.gauge

DGAUGE, count

Total CPUs per cluster. Some CPUs may be reserved for system needs.

node.cpu_usage.gauge

DGAUGE, number

CPU utilization on hosts.

node.allocatable_memory.gauge

DGAUGE, bytes

Host RAM available to containers.

node.capacity_memory.gauge

DGAUGE, bytes

Total host RAM. Some RAM may be reserved for system needs.

node.memory_usage.gauge

DGAUGE, bytes

Host RAM in use.

node.fs_available_bytes.gauge

DGAUGE, bytes

Host disk space available to containers.

node.fs_capacity_bytes.gauge

DGAUGE, bytes

Host disk capacity. Some disk space may be reserved for system needs.

node.fs_used_bytes.gauge

DGAUGE, bytes

Used host disk space.

node.network_rx_bytes.gauge

DGAUGE, bytes

Incoming network traffic to the cluster.

node.network_rx_errors.gauge

DGAUGE, count

Number of network traffic receive errors in the cluster.

node.network_tx_bytes.gauge

DGAUGE, bytes

Outgoing network traffic from the cluster.

node.network_tx_errors.gauge

DGAUGE, count

Number of network traffic send errors in the cluster.

pod.running.gauge

DGAUGE, count

Number of running containers.

The additional pod_name label can take the container ID value.

pod.succeeded.gauge

DGAUGE, count

Number of successfully completed containers.

The additional pod_name label can take the container ID value.

pod.pending.gauge

DGAUGE, count

Number of containers waiting to run.

The additional pod_name label can take the container ID value.

pod.failed.gauge

DGAUGE, count

Number of containers which failed to start.

The additional pod_name label can take the container ID value.

pod.unknown.gauge

DGAUGE, count

Number of containers in an unknown state.

The additional pod_name label can take the container ID value.

pod_container.ready.gauge

DGAUGE, count

Number of containers ready to run.

The additional pod_name label can take the container ID value.

pod_container.started.gauge

DGAUGE, count

Number of running containers.

The additional pod_name label can take the container ID value.

pod_container.restart_count.gauge

DGAUGE, count

Number of container restarts.

The additional pod_name label can take the container ID value.

pod_container.cpu_usage.gauge

DGAUGE, number

CPU utilization by the container.

The additional pod_name label can take the container ID value.

pod_container.cpu_limit.gauge

DGAUGE, number

Container CPU limit.

The additional pod_name label can take the container ID value.

pod_container.memory_usage.gauge

DGAUGE, bytes

Memory used by the container.

The additional pod_name label can take the container ID value.

pod_container.memory_limit.gauge

DGAUGE, bytes

Container memory limit.

The additional pod_name label can take the container ID value.

pod_container.logsfs_capacity_bytes.gauge

DGAUGE, bytes

LogFS space allocated to the container.

The additional pod_name label can take the container ID value.

pod_container.logsfs_available_bytes.gauge

DGAUGE, bytes

LogFS space available in the container to run applications.

The additional pod_name label can take the container ID value.

pod_container.rootfs_capacity_bytes.gauge

DGAUGE, bytes

RootFS space allocated to the container.

The additional pod_name label can take the container ID value.

pod_container.rootfs_available_bytes.gauge

DGAUGE, bytes

Available container RootFS space.

The additional pod_name label can take the container ID value.

pod_container.rootfs_used_bytes.gauge

DGAUGE, bytes

Used container RootFS space.

The additional pod_name label can take the container ID value.

pod_network.rx_bytes.gauge

DGAUGE, bytes

Incoming network traffic to the container.

The additional pod_name label can take the container ID value.

pod_network.rx_errors.gauge

DGAUGE, count

Number of network receive errors in the container.

The additional pod_name label can take the container ID value.

pod_network.tx_bytes.gauge

DGAUGE, bytes

Outgoing network traffic from the container.

The additional pod_name label can take the container ID value.

pod_network.tx_errors.gauge

DGAUGE, count

Number of network send errors in the container.

The additional pod_name label can take the container ID value.

pod_volume.capacity_bytes.gauge

DGAUGE, bytes

Total size of the disk attached to the container.

The additional pod_name label can take the container ID value.

pod_volume.available_bytes.gauge

DGAUGE, bytes

Available space on the disk attached to the container.

The additional pod_name label can take the container ID value.

pod_volume.used_bytes.gauge

DGAUGE, bytes

Used space of the disk attached to the container.

The additional pod_name label can take the container ID value.

Service metricsService metrics

Driver metricsDriver metrics

These are Apache Spark™ native metrics for driver monitoring.

The metrics have the following additional labels:

  • pod_name, which can take the container ID value.
  • job_id, which can take the job ID value.

Name

Type

driver_appstatus_jobduration_number.value

DGAUGE

driver_appstatus_jobduration_value.value

DGAUGE

driver_appstatus_jobs_failedjobs_count.value

DGAUGE

driver_appstatus_jobs_succeededjobs_count.value

DGAUGE

driver_appstatus_stages_completedstages_count.value

DGAUGE

driver_appstatus_stages_failedstages_count.value

DGAUGE

driver_appstatus_stages_skippedstages_count.value

DGAUGE

driver_appstatus_tasks_blacklistedexecutors_count.value

DGAUGE

driver_appstatus_tasks_completedtasks_count.value

DGAUGE

driver_appstatus_tasks_excludedexecutors_count.value

DGAUGE

driver_appstatus_tasks_failedtasks_count.value

DGAUGE

driver_appstatus_tasks_killedtasks_count.value

DGAUGE

driver_appstatus_tasks_skippedtasks_count.value

DGAUGE

driver_appstatus_tasks_unblacklistedexecutors_count.value

DGAUGE

driver_appstatus_tasks_unexcludedexecutors_count.value

DGAUGE

driver_dagscheduler_job_activejobs_number.value

DGAUGE

driver_dagscheduler_job_activejobs_value.value

DGAUGE

driver_dagscheduler_job_alljobs_number.value``DGAUGE

driver_dagscheduler_job_alljobs_value.value``DGAUGE

driver_dagscheduler_stage_failedstages_number.value

DGAUGE

driver_dagscheduler_stage_failedstages_value.value

DGAUGE

driver_dagscheduler_stage_runningstages_number.value

DGAUGE

driver_dagscheduler_stage_runningstages_value.value

DGAUGE

driver_dagscheduler_stage_waitingstages_number.value

DGAUGE

driver_dagscheduler_stage_waitingstages_value.value

DGAUGE

Executor metricsExecutor metrics

These are Apache Spark™ native metrics for executor monitoring.

The metrics have the following additional labels:

  • pod_name, which can take the container ID value.
  • job_id, which can take the job ID value.
  • executor_id, which can take the driver value.
  • application_id, which can take the Spark app ID value.
  • application_name, which can take the Spark app name value.

Name

Type

executor_activetasks.value

DGAUGE

executor_completedtasks_total.value

DGAUGE

executor_failedtasks_total.value

DGAUGE

executor_directpoolmemory_bytes.value

DGAUGE

executor_diskused_bytes.value

DGAUGE

executor_jvmheapmemory_bytes.value

DGAUGE

executor_jvmoffheapmemory_bytes.value

DGAUGE

executor_majorgccount_total.value

DGAUGE

executor_majorgctime_seconds_total.value

DGAUGE

executor_mappedpoolmemory_bytes.value

DGAUGE

executor_maxmemory_bytes.value

DGAUGE

executor_maxtasks.value

DGAUGE

executor_memoryused_bytes.value

DGAUGE

executor_minorgccount_total.value

DGAUGE

executor_minorgctime_seconds_total.value

DGAUGE

executor_offheapexecutionmemory_bytes.value

DGAUGE

executor_offheapstoragememory_bytes.value

DGAUGE

executor_offheapunifiedmemory_bytes.value

DGAUGE

executor_onheapexecutionmemory_bytes.value

DGAUGE

executor_onheapstoragememory_bytes.value

DGAUGE

executor_onheapunifiedmemory_bytes.value

DGAUGE

executor_processtreejvmrssmemory_bytes.value

DGAUGE

executor_processtreejvmvmemory_bytes.value

DGAUGE

executor_processtreeotherrssmemory_bytes.value

DGAUGE

executor_processtreeothervmemory_bytes.value

DGAUGE

executor_processtreepythonrssmemory_bytes.value

DGAUGE

executor_processtreepythonvmemory_bytes.value

DGAUGE

executor_rddblocks.value

DGAUGE

executor_totalcores.value

DGAUGE

executor_totalduration_seconds_total.value

DGAUGE

executor_totalgctime_seconds_total.value

DGAUGE

executor_totalinputbytes_bytes_total.value

DGAUGE

executor_totaloffheapstoragememory_bytes.value

DGAUGE

executor_totalonheapstoragememory_bytes.value

DGAUGE

executor_totalshuffleread_bytes_total.value

DGAUGE

executor_totalshufflewrite_bytes_total.value

DGAUGE

executor_totaltasks_total.value

DGAUGE

executor_usedoffheapstoragememory_bytes.value

DGAUGE

executor_usedonheapstoragememory_bytes.value

DGAUGE

Was the article helpful?

Previous
Managed Service for Apache Kafka®
Next
Managed Service for ClickHouse®
© 2025 Direct Cursus Technology L.L.C.