Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex DataSphere
  • Getting started
    • About Yandex DataSphere
    • DataSphere resource relationships
    • Communities
    • Cost management
    • Project
    • Computing resource configurations
      • List of pre-installed software
      • Available commands
        • Ways to use Apache Spark™ clusters
        • Computing on Apache Spark™ clusters
        • Specifics of working with temporary Yandex Data Processing clusters
    • Foundation models
    • Quotas and limits
    • Special terms for educational institutions
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Cluster deployment options
  • Setting up a DataSphere project to work with Yandex Data Processing clusters
  1. Concepts
  2. DataSphere Notebook
  3. Computing on Apache Spark™ clusters
  4. Ways to use Apache Spark™ clusters

Ways to use Apache Spark™ clusters in DataSphere

Written by
Yandex Cloud
Updated at December 27, 2024
  • Cluster deployment options
  • Setting up a DataSphere project to work with Yandex Data Processing clusters

Yandex Data Processing allows you to deploy Apache Spark™ clusters. You can use Yandex Data Processing clusters to run distributed training.

Cluster deployment optionsCluster deployment options

To work with Yandex Data Processing clusters in DataSphere, you can use the following:

  • Spark connector
  • Livy session

If you have no existing Yandex Data Processing clusters or you need a cluster for a short time, use temporary Yandex Data Processing clusters. You can create them using the following:

  • Spark connector (preferred)
  • Yandex Data Processing template

Regardless of the deployment option, all Yandex Data Processing clusters are charged based on the Yandex Data Processing pricing policy.

Setting up a DataSphere project to work with Yandex Data Processing clustersSetting up a DataSphere project to work with Yandex Data Processing clusters

To use Yandex Data Processing clusters, specify the following parameters in your project settings:

  • Default folder for integrating with other Yandex Cloud services. A Yandex Data Processing cluster will be deployed in this folder based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.

  • Service account DataSphere will use for creating and managing clusters. The service account needs the following roles:

    • dataproc.agent to use Yandex Data Processing clusters.
    • dataproc.admin to create clusters from Yandex Data Processing templates.
    • vpc.user to use the Yandex Data Processing cluster network.
    • iam.serviceAccounts.user to create resources in the folder on behalf of the service account.
  • Subnet for DataSphere to communicate with the Yandex Data Processing cluster. Since the Yandex Data Processing cluster needs to access the internet, make sure to configure a NAT gateway in the subnet.

    Note

    If you specified a subnet in the project settings, the time to allocate computing resources may be increased.

Warning

The Yandex Data Processing persistent cluster must have the livy:livy.spark.deploy-mode : client setting.

See alsoSee also

  • Yandex Data Processing templates
  • Integration with Yandex Data Processing
  • Spark connector

Was the article helpful?

Previous
Available commands
Next
Computing on Apache Spark™ clusters
Yandex project
© 2025 Yandex.Cloud LLC