Getting started with Yandex Managed Service for Apache Spark™

Written by

Improved by

Updated at October 20, 2025

To get started:

Required paid resources

The cost of infrastructure support includes a fee for a Yandex Object Storage bucket (see Object Storage pricing).

Navigate to the management console and log in to Yandex Cloud or sign up if not signed up yet.
If you do not have a folder yet, create one:
1. In the management console, select the appropriate cloud from the list on the left.
2. At the top right, click Create folder.
3. Give your folder a name. The naming requirements are as follows:
  - It must be from 2 to 63 characters long.
  - It can only contain lowercase Latin letters, numbers, and hyphens.
  - It must start with a letter and cannot end with a hyphen.
4. Optionally, specify the description for your folder.
5. Select Create a default network. This will create a network with subnets in each availability zone. Within this network, you will also have a default security group, within which all network traffic will be allowed.
6. Click Create.
Assign the following roles to your Yandex Cloud account:
- managed-spark.admin: To create a cluster.
- vpc.user: To use the cluster network.
- iam.serviceAccounts.user: To assign a service account to a cluster.
Note

If you are unable to manage roles, contact your cloud or organization administrator.

Create a service account and assign the following roles to it:
- managed-spark.integrationProvider: For Yandex Managed Service for Apache Spark™ to interact with other system components, e.g., for sending logs and metrics.
- storage.editor: For accessing PySpark job files in an Object Storage bucket.
Create an Object Storage bucket.
Grant the service account access to the Object Storage that will be storing your code and data for cluster-specific jobs:
1. In the management console, select the folder you need.
2. In the list of services, select Object Storage.
  1. Open the bucket you created earlier.
  2. Navigate to Objects.
  3. Click and select Configure ACL.
  4. In the ACL editing window that opens:
    1. Start typing the service account name you created earlier and select it from the drop-down list.
    2. Select the READ and WRITE access permissions.
    3. Click Add.
    4. Click Save.

Management console

In the management console, select the folder where you want to create a cluster.
Select Managed Service for Apache Spark™.
Click Create cluster.
Give the cluster a name.
In the Service account field, select the previously created service account.
Under Network settings, select a network, subnet, and security group for the cluster.
Set up computing resources for hosts to run drivers and workers.
Under Advanced settings, configure logging:
1. Enable the Write logs setting.
2. In the Destination field, select where the log destination: Folder.
3. In the Folder field, select your folder from the list.
4. Select Min. logging level: INFO.
Click Create.
Wait until the cluster is ready for use, i.e., its status on the Yandex Managed Service for Apache Spark™ dashboard switches to Running and its state, to Alive. This may take some time.

Save the pi.py file, that contains the job code, to a local computer from the Apache Spark™ repository. This code calculates the approximate value of pi using the Monte Carlo method.
Upload the file to the Object Storage bucket you created earlier.

In the management console, open the cluster you created earlier.
Navigate to Jobs.
Click Create job.
Select the Job type: PySpark.
In the Main python file field, specify the path to pi.py in the following format: s3a://<Object_Storage_bucket_name>/<file_name>.
Click Submit job.

Wait for the job to become Running.
Navigate to the Logs tab.
In the logs, look for a line with the job results, such as the following:
```
Pi is roughly 3.144720
```