Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Apache Spark™
  • Getting started
  • Access management
  • Pricing policy
  • Yandex Monitoring metrics
  • Terraform reference
  • Release notes

In this article:

  • Required paid resources
  • Get your cloud ready
  • Set up the infrastructure
  • Create a cluster
  • Prepare a PySpark job
  • Run your PySpark job
  • Check the job completion

Getting started with Yandex Managed Service for Apache Spark™

Written by
Yandex Cloud
Improved by
Danila N.
Updated at October 20, 2025
  • Required paid resources
  • Get your cloud ready
  • Set up the infrastructure
  • Create a cluster
  • Prepare a PySpark job
  • Run your PySpark job
  • Check the job completion

To get started:

  • Get your cloud ready.
  • Set up your infrastructure.
  • Create a cluster.
  • Prepare a PySpark job.
  • Run the job in the cluster.
  • Check the job completion.

Required paid resourcesRequired paid resources

The cost of infrastructure support includes a fee for a Yandex Object Storage bucket (see Object Storage pricing).

Get your cloud readyGet your cloud ready

  1. Navigate to the management console and log in to Yandex Cloud or sign up if not signed up yet.

  2. If you do not have a folder yet, create one:

    1. In the management console, select the appropriate cloud from the list on the left.

    2. At the top right, click Create folder.

    3. Give your folder a name. The naming requirements are as follows:

      • It must be from 2 to 63 characters long.
      • It can only contain lowercase Latin letters, numbers, and hyphens.
      • It must start with a letter and cannot end with a hyphen.
    4. Optionally, specify the description for your folder.

    5. Select Create a default network. This will create a network with subnets in each availability zone. Within this network, you will also have a default security group, within which all network traffic will be allowed.

    6. Click Create.

  3. Assign the following roles to your Yandex Cloud account:

    • managed-spark.admin: To create a cluster.
    • vpc.user: To use the cluster network.
    • iam.serviceAccounts.user: To assign a service account to a cluster.

    Note

    If you are unable to manage roles, contact your cloud or organization administrator.

Set up the infrastructureSet up the infrastructure

  1. Create a service account and assign the following roles to it:

    • managed-spark.integrationProvider: For Yandex Managed Service for Apache Spark™ to interact with other system components, e.g., for sending logs and metrics.
    • storage.editor: For accessing PySpark job files in an Object Storage bucket.
  2. Create an Object Storage bucket.

  3. Grant the service account access to the Object Storage that will be storing your code and data for cluster-specific jobs:

    1. In the management console, select the folder you need.
    2. In the list of services, select Object Storage.
      1. Open the bucket you created earlier.

      2. Navigate to Objects.

      3. Click and select Configure ACL.

      4. In the ACL editing window that opens:

        1. Start typing the service account name you created earlier and select it from the drop-down list.
        2. Select the READ and WRITE access permissions.
        3. Click Add.
        4. Click Save.

Create a clusterCreate a cluster

Management console
  1. In the management console, select the folder where you want to create a cluster.

  2. Select Managed Service for Apache Spark™.

  3. Click Create cluster.

  4. Give the cluster a name.

  5. In the Service account field, select the previously created service account.

  6. Under Network settings, select a network, subnet, and security group for the cluster.

  7. Set up computing resources for hosts to run drivers and workers.

  8. Under Advanced settings, configure logging:

    1. Enable the Write logs setting.
    2. In the Destination field, select where the log destination: Folder.
    3. In the Folder field, select your folder from the list.
    4. Select Min. logging level: INFO.
  9. Click Create.

  10. Wait until the cluster is ready for use, i.e., its status on the Yandex Managed Service for Apache Spark™ dashboard switches to Running and its state, to Alive. This may take some time.

Prepare a PySpark jobPrepare a PySpark job

  1. Save the pi.py file, that contains the job code, to a local computer from the Apache Spark™ repository. This code calculates the approximate value of pi using the Monte Carlo method.

  2. Upload the file to the Object Storage bucket you created earlier.

Run your PySpark jobRun your PySpark job

  1. In the management console, open the cluster you created earlier.
  2. Navigate to Jobs.
  3. Click Create job.
  4. Select the Job type: PySpark.
  5. In the Main python file field, specify the path to pi.py in the following format: s3a://<Object_Storage_bucket_name>/<file_name>.
  6. Click Submit job.

Check the job completionCheck the job completion

  1. Wait for the job to become Running.

  2. Navigate to the Logs tab.

  3. In the logs, look for a line with the job results, such as the following:

    Pi is roughly 3.144720
    

Was the article helpful?

Next
All guides
© 2025 Direct Cursus Technology L.L.C.