Managing SparkConnect jobs

Written by

Yandex Cloud

Updated at March 5, 2026

Creating a job
Cancel a job
Get a list of jobs
Get general info about a job
Get job execution logs

Note

This feature is at the Preview stage.

Creating a job

Warning

Once created, the job will run automatically.

Management console

CLI

gRPC API

Go to the folder page.
Go to Managed Service for Apache Spark™.
Click the name of your cluster and select the Jobs tab.
Click Create job.
Enter the job name.
In the Job type field, select SparkConnect.
Optionally, configure advanced settings:
- Specify paths to required files and archives.
- Specify paths to your JAR files in the following format:
  
  File location Path format
  
  Instance file system file:///<file_path>
  
  Object Storage bucket s3a://<bucket_name>/<file_path>
  
  Internet http://<path_to_file> or https://<path_to_file>
  
  Archives in standard Linux formats, such as zip, gz, xz, bz2, etc., are supported.
  
  The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket ACL.
- In the Properties field, specify component properties as key-value pairs.
- Specify the coordinates of included and excluded Maven packages as well as URLs of additional repositories for package search.
Click Submit job.

File location	Path format
Instance file system	`file:///<file_path>`
Object Storage bucket	`s3a://<bucket_name>/<file_path>`
Internet	`http://<path_to_file>` or `https://<path_to_file>`

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id options.

To create a SparkConnect job:

See the description of the CLI command for creating a job:
```
yc managed-spark job create-spark-connect --help
```
Create a job by running this command:
```
yc managed-spark job create-spark-connect \
  --cluster-id <cluster_ID> \
  --name <job_name> \
  --jar-file-uris <list_of_paths_to_JAR_files> \
  --file-uris <list_of_paths_to_files> \
  --archive-uris <list_of_paths_to_archives> \
  --packages <list_of_package_Maven_coordinates> \
  --repositories <list_of_URLs_of_repositories_for_package_search>\
  --exclude-packages <list_of_Maven_coordinates_of_excluded_packages> \
  --properties <list_of_properties> 
```
Where:
- --cluster-id: Cluster ID.
  
  You can get the cluster ID with the list of clusters in the folder.
- --name (optional): Job name.
- --jar-file-uris: List of paths to JAR files in the following format:
  
  File location Path format
  
  Instance file system file:///<file_path>
  
  Object Storage bucket s3a://<bucket_name>/<file_path>
  
  Internet http://<path_to_file> or https://<path_to_file>
  
  Archives in standard Linux formats, such as zip, gz, xz, bz2, etc., are supported.
  
  The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket ACL.
- --file-uris: List of paths to files.
- --archive-uris: List of paths to archives.
- --packages: List of Maven coordinates of JAR files in groupId:artifactId:version format.
- --repositories: List of URLs of additional repositories for package search.
- --exclude-packages: List of Maven coordinates of the packages to exclude, in groupId:artifactId format.
- --properties: List of component properties in key=value format.

File location	Path format
Instance file system	`file:///<file_path>`
Object Storage bucket	`s3a://<bucket_name>/<file_path>`
Internet	`http://<path_to_file>` or `https://<path_to_file>`

Get an IAM token for API authentication and put it into an environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Clone the cloudapi repository:
```
cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
```
Below, we assume that the repository contents reside in the ~/cloudapi/ directory.
Call the JobService.Create method, e.g., via the following gRPCurl request:
```
grpcurl \
  -format json \
  -import-path ~/cloudapi/ \
  -import-path ~/cloudapi/third_party/googleapis/ \
  -proto ~/cloudapi/yandex/cloud/spark/v1/job_service.proto \
  -rpc-header "Authorization: Bearer $IAM_TOKEN" \
  -d '{
    "cluster_id": "<cluster_ID>",
    "name": "<job_name>",
    "spark_connect_job": {
      "jar_file_uris": [
        <list_of_paths_to_JAR_files>
      ],
      "file_uris": [
        <list_of_paths_to_files>
      ],
      "archive_uris": [
        <list_of_paths_to_archives>
      ],
      "properties": {
        <list_of_properties>
      },
      "packages": [
        <list_of_package_Maven_coordinates>
      ],
      "repositories": [
        <list_of_URLs_of_repositories_for_package_search>
      ],
      "exclude_packages": [
        <list_of_Maven_coordinates_of_excluded_packages>
      ]
    }
  }' \
  spark.api.cloud.yandex.net:443 \
  yandex.cloud.spark.v1.JobService.Create
```
Where:
- cluster_id: Cluster ID.
  
  You can get the cluster ID with the list of clusters in the folder.
- name (optional): Job name.
- spark_connect_job: SparkConnect job parameters:
  - jar_file_uris: List of paths to JAR files in the following format:
    
    File location Path format
    
    Instance file system file:///<file_path>
    
    Object Storage bucket s3a://<bucket_name>/<file_path>
    
    Internet http://<path_to_file> or https://<path_to_file>
    
    Archives in standard Linux formats, such as zip, gz, xz, bz2, etc., are supported.
    
    The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in Editing a bucket ACL.
  - file_uris: List of paths to files.
  - archive_uris: List of paths to archives.
  - properties: List of component properties in "key":"value" format.
  - packages: List of Maven coordinates of JAR files in groupId:artifactId:version format.
  - repositories: List of URLs of additional repositories for package search.
  - exclude_packages: List of Maven coordinates of the packages to exclude, in groupId:artifactId format.
View the server response to make sure your request was successful.

File location	Path format
Instance file system	`file:///<file_path>`
Object Storage bucket	`s3a://<bucket_name>/<file_path>`
Internet	`http://<path_to_file>` or `https://<path_to_file>`

For running SparkConnect jobs, there is an endpoint available you can use to connect via PySpark. You can get the endpoint with the job information. Its value is specified in the connect_url field in the CLI or API or in the Connection URL field in the management console.

For example: sc://connect-api-c9q9veov4uql********-c9q8ml85r1oh********.spark.yandexcloud.net:443.

Cancel a job

Note

You cannot cancel jobs with the ERROR, DONE, or CANCELLED status. To find out a job's status, retrieve a list of jobs in the cluster.

Management console

CLI

gRPC API

Go to the folder page.
Go to Managed Service for Apache Spark™.
Click the name of your cluster and select the Jobs tab.
Click the job name.
Click Cancel in the top-right corner of the page.
In the window that opens, select Cancel job.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To cancel a job, do the following:

View the description of the CLI command for canceling a job:
```
yc managed-spark job cancel --help
```
Cancel a job by running this command:
```
yc managed-spark job cancel <job_name_or_ID> \
  --cluster-id <cluster_ID>
```
You can get the cluster ID with the list of clusters in the folder.

You can get the job name and ID with the list of cluster jobs.

Get an IAM token for API authentication and put it in an environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Clone the cloudapi repository:
```
cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
```
Below, we assume that the repository contents reside in the ~/cloudapi/ directory.

Call the JobService.Cancel method, e.g., via the following gRPCurl request:

grpcurl \
    -format json \
    -import-path ~/cloudapi/ \
    -import-path ~/cloudapi/third_party/googleapis/ \
    -proto ~/cloudapi/yandex/cloud/spark/v1/job_service.proto \
    -rpc-header "Authorization: Bearer $IAM_TOKEN" \
    -d '{
           "cluster_id": "<cluster_ID>",
           "job_id": "<job_ID>"
       }' \
    spark.api.cloud.yandex.net:443 \
    yandex.cloud.spark.v1.JobService.Cancel

You can get the cluster ID with the list of folder clusters, and the job ID, with the list of cluster jobs.

View the server response to make sure your request was successful.

Get a list of jobs

Management console

CLI

gRPC API

Go to the folder page.
Go to Managed Service for Apache Spark™.
Click the name of your cluster and select the Jobs tab.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To get a list of cluster jobs:

See the description of the CLI command for getting a list of jobs:
```
yc managed-spark job list --help
```
Get the list of jobs by running this command:
```
yc managed-spark job list \
  --cluster-id <cluster_ID>
```
You can get the cluster ID with the list of clusters in the folder.

Get an IAM token for API authentication and put it in an environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Clone the cloudapi repository:
```
cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
```
Below, we assume that the repository contents reside in the ~/cloudapi/ directory.

Call the JobService.List method, e.g., via the following gRPCurl request:

grpcurl \
    -format json \
    -import-path ~/cloudapi/ \
    -import-path ~/cloudapi/third_party/googleapis/ \
    -proto ~/cloudapi/yandex/cloud/spark/v1/job_service.proto \
    -rpc-header "Authorization: Bearer $IAM_TOKEN" \
    -d '{
           "cluster_id": "<cluster_ID>"
       }' \
    spark.api.cloud.yandex.net:443 \
    yandex.cloud.spark.v1.JobService.List

You can get the cluster ID with the list of clusters in the folder.

View the server response to make sure your request was successful.

Get general info about a job

Management console

CLI

gRPC API

Go to the folder page.
Go to Managed Service for Apache Spark™.
Click the name of your cluster and select the Jobs tab.
Click the job name.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To get information about a job:

View the description of the CLI command for getting information about a job:
```
yc managed-spark job get --help
```
Get information about the job by running this command:
```
yc managed-spark job get <job_ID> \
  --cluster-id <cluster_ID>
```
You can get the cluster ID with the list of clusters in the folder.

You can get the job ID with the list of cluster jobs.

Get an IAM token for API authentication and put it in an environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Clone the cloudapi repository:
```
cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
```
Below, we assume that the repository contents reside in the ~/cloudapi/ directory.

Call the JobService.Get method, e.g., via the following gRPCurl request:

grpcurl \
    -format json \
    -import-path ~/cloudapi/ \
    -import-path ~/cloudapi/third_party/googleapis/ \
    -proto ~/cloudapi/yandex/cloud/spark/v1/job_service.proto \
    -rpc-header "Authorization: Bearer $IAM_TOKEN" \
    -d '{
           "cluster_id": "<cluster_ID>",
           "job_id": "<job_ID>"
       }' \
    spark.api.cloud.yandex.net:443 \
    yandex.cloud.spark.v1.JobService.Get

You can get the cluster ID with the list of folder clusters, and the job ID, with the list of cluster jobs.

View the server response to make sure your request was successful.

Get job execution logs

Warning

To get job execution logs, enable logging in your cluster while creating it.

Management console

CLI

gRPC API

Go to the folder page.
Go to Managed Service for Apache Spark™.
Click the name of your cluster and select the Jobs tab.
Click the job name.
In the Output logs field, click the link.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

To get job execution logs:

See the description of the CLI command for getting job logs:
```
yc managed-spark job log --help
```
Get job logs by running this command:
```
yc managed-spark job log <job_ID> \
  --cluster-id <cluster_ID>
```
You can get the cluster ID with the list of clusters in the folder.

You can get the job ID with the list of cluster jobs.

To get logs for multiple jobs, list their IDs separated by spaces, e.g.:
```
yc managed-spark job log c9q9veov4uql******** c9qu8uftedte******** \
  --cluster-id c9q8ml85r1oh********
```

Get an IAM token for API authentication and put it in an environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Clone the cloudapi repository:
```
cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
```
Below, we assume that the repository contents reside in the ~/cloudapi/ directory.

Call the JobService.ListLog method, e.g., via the following gRPCurl request:

grpcurl \
    -format json \
    -import-path ~/cloudapi/ \
    -import-path ~/cloudapi/third_party/googleapis/ \
    -proto ~/cloudapi/yandex/cloud/spark/v1/job_service.proto \
    -rpc-header "Authorization: Bearer $IAM_TOKEN" \
    -d '{
           "cluster_id": "<cluster_ID>",
           "job_id": "<job_ID>"
       }' \
    spark.api.cloud.yandex.net:443 \
    yandex.cloud.spark.v1.JobService.ListLog

You can request the cluster ID with the list of clusters in the folder, and the job ID, with the list of cluster jobs.

View the server response to make sure your request was successful.

Managing SparkConnect jobs

Creating a jobCreating a job

Cancel a jobCancel a job

Get a list of jobsGet a list of jobs

Get general info about a jobGet general info about a job

Get job execution logsGet job execution logs

Was the article helpful?

Creating a job

Cancel a job

Get a list of jobs

Get general info about a job

Get job execution logs