Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Processing
  • Getting started
    • All guides
      • Working with logs
      • Monitoring the state of clusters and hosts
      • Monitoring the state of Spark applications
      • Diagnostics and troubleshooting of Spark application performance issues
    • Setting up and using Python virtual environments
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

In this article:

  • Checking the application list
  • Checking application logs
  • Checking the application queue
  • Checking application details
  • Checking resources allocated to the application
  • Checking persisted RDDs
  • Checking the list of SQL queries and their execution plans
  1. Step-by-step guides
  2. Logs and monitoring
  3. Monitoring the state of Spark applications

Monitoring the state of Spark applications

Written by
Yandex Cloud
Updated at December 26, 2024
  • Checking the application list
  • Checking application logs
  • Checking the application queue
  • Checking application details
  • Checking resources allocated to the application
  • Checking persisted RDDs
  • Checking the list of SQL queries and their execution plans

To evaluate the performance of Spark applications in a Yandex Data Processing cluster, you can check the following:

  • Application list
  • Application logs
  • Application queue
  • Application details
  • Resources allocated to the application
  • Persisted RDDs
  • List of SQL queries and their execution plans

Note

Make sure the cluster has the component web interfaces enabled. If it does not, enable the ones you need.

Checking the application listChecking the application list

  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name.
  3. Under UI Proxy, select the YARN Resource Manager Web UI interface.

It shows information about all running and completed applications.

Checking application logsChecking application logs

  1. Go to the folder page and select Yandex Data Processing.

  2. Click the cluster name.

  3. Under UI Proxy, select the YARN Resource Manager Web UI interface.

  4. Find the application you need and click its ID in the ID column.

    This will open a window with information about the application's performance and a table with a list of application run attempts.

  5. Click the link next to the attempt in question in the Logs column.

Checking the application queueChecking the application queue

  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name.
  3. Under UI Proxy, select the YARN Resource Manager Web UI interface.
  4. In the left-hand menu, go to Scheduler.

The Application Queues section shows the queue of applications and resources used by them.

Checking application detailsChecking application details

YARN Resource Manager Web UI
Spark History Server Web UI
  1. Go to the folder page and select Yandex Data Processing.

  2. Click the cluster name.

  3. Under UI Proxy, select the YARN Resource Manager Web UI interface.

  4. Find the application in question and follow the link in the Tracking UI column. The link name depends on the application status:

    • ApplicationMaster for running applications
    • History for finished applications
  1. Go to the folder page and select Yandex Data Processing.

  2. Click the cluster name.

  3. Under UI Proxy, select the Spark History Server Web UI interface.

    This will open the list of finished applications. To switch to the list of running applications, click Show incomplete applications at the bottom of the table.

  4. Find the application in question and follow the link in the App ID column.

This will open the Spark History Server Web UI window with details of the selected application:

  • Event Timeline: History of job runs with info about added and removed executors
  • Active Jobs: List of jobs being run or waiting to be run.
  • Completed Jobs: List of finished jobs.

For each job, the table specifies:

  • Start time (Submitted)
  • Duration
  • Stages: Succeeded/Total
  • Tasks: Succeeded/Total

Checking resources allocated to the applicationChecking resources allocated to the application

  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name.
  3. Under UI Proxy, select the Spark History Server Web UI interface.
  4. In the top menu, go to Executors.

The UI will display two tables:

  • Summary: High-level information, such as the number and status of executors and resources used.
  • Executors: Information about each executor.

The tables specify the following:

  • Amount of resources available per resource executor.
  • Number of running and completed tasks.
  • Task duration (Task Time), including the time spent for garbage collection (GC Time).

Tip

If garbage collection takes much time:

  • Make sure you have enough memory allocated to the executor.
  • Configure the garbage collector manually. To learn how to do this, see the Apache Spark documentation.

Checking persisted RDDsChecking persisted RDDs

  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name.
  3. Under UI Proxy, select the Spark History Server Web UI interface.
  4. In the top menu, go to Storage.

The UI displays the list of cacheable tables (RDDs). For each RDD, it shows information about the used memory and disk space, as well as caching progress.

To view detailed statistics, click the RDD name.

Checking the list of SQL queries and their execution plansChecking the list of SQL queries and their execution plans

  1. Go to the folder page and select Yandex Data Processing.
  2. Click the cluster name.
  3. Under UI Proxy, select the Spark History Server Web UI interface.
  4. In the top menu, go to SQL.

The table lists executed SQL queries, including their start time and duration.

To see the query execution plan, click the query text in the Description column. The query execution plan is displayed as a flowchart. To view it as text, click Details at the bottom of the figure.

The query execution plan contains statistics for each operator with the number of completed tasks and their duration. If the query is still running, the current statistics will be shown.

Was the article helpful?

Previous
Monitoring the state of clusters and hosts
Next
Diagnostics and troubleshooting of Spark application performance issues
© 2025 Direct Cursus Technology L.L.C.