Yandex Managed Service for Apache Spark™ metrics
This section describes Managed Service for Apache Spark™ metrics delivered to Monitoring.
The name label contains the metric name.
Labels shared by all Managed Service for Apache Spark™ metrics:
|
Label |
Value |
|
service |
Service ID: |
|
cluster_id |
Cluster ID |
|
node_name |
Host ID |
|
node_role |
Host role. The possible values are:
|
Cluster resource metrics
|
Name Type, units |
Description |
|
|
Number of CPUs available to containers. |
|
|
Total CPUs per cluster. Some CPUs may be reserved for system needs. |
|
|
CPU utilization on hosts. |
|
|
Host RAM available to containers. |
|
|
Total host RAM. Some RAM may be reserved for system needs. |
|
|
Host RAM in use. |
|
|
Host disk space available to containers. |
|
|
Host disk capacity. Some disk space may be reserved for system needs. |
|
|
Used host disk space. |
|
|
Incoming network traffic to the cluster. |
|
|
Number of network traffic receive errors in the cluster. |
|
|
Outgoing network traffic from the cluster. |
|
|
Number of network traffic send errors in the cluster. |
|
|
Number of running containers. The additional |
|
|
Number of successfully completed containers. The additional |
|
|
Number of containers waiting to run. The additional |
|
|
Number of containers which failed to start. The additional |
|
|
Number of containers in an unknown state. The additional |
|
|
Number of containers ready to run. The additional |
|
|
Number of running containers. The additional |
|
|
Number of container restarts. The additional |
|
|
CPU utilization by the container. The additional |
|
|
Container CPU limit. The additional |
|
|
Memory used by the container. The additional |
|
|
Container memory limit. The additional |
|
|
LogFS space allocated to the container. The additional |
|
|
LogFS space available in the container to run applications. The additional |
|
|
RootFS space allocated to the container. The additional |
|
|
Available container RootFS space. The additional |
|
|
Used container RootFS space. The additional |
|
|
Incoming network traffic to the container. The additional |
|
|
Number of network receive errors in the container. The additional |
|
|
Outgoing network traffic from the container. The additional |
|
|
Number of network send errors in the container. The additional |
|
|
Total size of the disk attached to the container. The additional |
|
|
Available space on the disk attached to the container. The additional |
|
|
Used space of the disk attached to the container. The additional |
Service metrics
Driver metrics
These are Apache Spark™ native metrics for driver monitoring
The metrics have the following additional labels:
pod_name, which can take the container ID value.job_id, which can take the job ID value.
|
Name Type |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Executor metrics
These are Apache Spark™ native metrics for executor monitoring
The metrics have the following additional labels:
pod_name, which can take the container ID value.job_id, which can take the job ID value.executor_id, which can take thedrivervalue.application_id, which can take the Spark app ID value.application_name, which can take the Spark app name value.
|
Name Type |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|