Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • About Yandex DataSphere
    • DataSphere resource relationships
    • Communities
    • Cost management
    • Project
    • Computing resource configurations
      • List of pre-installed software
      • Available commands
        • Ways to use Apache Spark™ clusters
        • Computing on Apache Spark™ clusters
        • Specifics of working with temporary Yandex Data Processing clusters
    • Foundation models
    • Quotas and limits
    • Special terms for educational institutions
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Cluster deployment options
  • Setting up a DataSphere project to work with Yandex Data Processing clusters
  1. Concepts
  2. DataSphere Notebook
  3. Computing on Apache Spark™ clusters
  4. Ways to use Apache Spark™ clusters

Ways to use Apache Spark™ clusters in DataSphere

Written by
Yandex Cloud
Updated at October 24, 2025
  • Cluster deployment options
  • Setting up a DataSphere project to work with Yandex Data Processing clusters

Yandex Data Processing allows you to deploy Apache Spark™ clusters. You can use Yandex Data Processing clusters to run distributed training.

Cluster deployment optionsCluster deployment options

To work with Yandex Data Processing clusters in DataSphere, you can use the following:

  • Spark connector
  • Livy session

If you have no existing Yandex Data Processing clusters or you need a cluster for a short time, use temporary Yandex Data Processing clusters. You can create them using the following:

  • Spark connector (preferred)
  • Yandex Data Processing template

Regardless of the deployment option, all Yandex Data Processing clusters are charged based on the Yandex Data Processing pricing policy.

Setting up a DataSphere project to work with Yandex Data Processing clustersSetting up a DataSphere project to work with Yandex Data Processing clusters

To work with Yandex Data Processing clusters:

  1. In the project settings, specify these parameters:

    • Default folder for integrating with other Yandex Cloud services. It will house a Yandex Data Processing cluster based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.
    • Service account with the vpc.user role. DataSphere will use for this account to work with the Yandex Data Processing cluster network.
    • Subnet for DataSphere to communicate with the Yandex Data Processing cluster. Since the Yandex Data Processing cluster needs to access the internet, make sure to configure a NAT gateway in this subnet. After you specify a subnet, the time for computing resource allocation may increase.
  2. Create a service agent:

    1. To allow a service agent to operate in DataSphere, ask your cloud admin or owner to run the following command in the Yandex Cloud CLI:

      yc iam service-control enable datasphere --cloud-id <cloud_ID>
      

      Where --cloud-id is the ID of the cloud you are going to use in the DataSphere community.

    2. Create a service account with the following roles:

      • dataproc.agent to use Yandex Data Processing clusters.
      • dataproc.admin to create clusters from Yandex Data Processing templates.
      • vpc.user to use the Yandex Data Processing cluster network.
      • iam.serviceAccounts.user to create resources in the folder on behalf of the service account.
    3. Under Spark clusters in the community settings, click Add service account and select the service account you created.

Warning

The Yandex Data Processing persistent cluster must have the livy:livy.spark.deploy-mode : client setting.

See alsoSee also

  • Yandex Data Processing templates
  • Integration with Yandex Data Processing
  • Spark connector

Was the article helpful?

Previous
Available commands
Next
Computing on Apache Spark™ clusters
© 2025 Direct Cursus Technology L.L.C.