Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • All guides
      • Using secrets
      • Working with Yandex Data Processing templates
      • Working with Docker images
      • Working with datasets
      • Working with S3 connectors
      • Working with Spark connectors
      • Working with models
      • File storage operations
    • Migrating a workflow to a new version
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Getting started
  • Creating a Spark connector
  • Using a Yandex Data Processing cluster in a project
  • Editing a Spark connector
  • Sharing a Spark connector
  • Deleting a Spark connector
  1. Step-by-step guides
  2. Working with resources
  3. Working with Spark connectors

Working with Spark connectors

Written by
Yandex Cloud
Updated at October 24, 2025
  • Getting started
  • Creating a Spark connector
  • Using a Yandex Data Processing cluster in a project
  • Editing a Spark connector
  • Sharing a Spark connector
  • Deleting a Spark connector

In DataSphere, you can use Spark connectors to work with existing or automatically created Yandex Data Processing clusters.

Getting startedGetting started

To work with Yandex Data Processing clusters:

  1. In the project settings, specify these parameters:

    • Default folder for integrating with other Yandex Cloud services. It will house a Yandex Data Processing cluster based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.
    • Service account with the vpc.user role. DataSphere will use for this account to work with the Yandex Data Processing cluster network.
    • Subnet for DataSphere to communicate with the Yandex Data Processing cluster. Since the Yandex Data Processing cluster needs to access the internet, make sure to configure a NAT gateway in this subnet. After you specify a subnet, the time for computing resource allocation may increase.
  2. Create a service agent:

    1. To allow a service agent to operate in DataSphere, ask your cloud admin or owner to run the following command in the Yandex Cloud CLI:

      yc iam service-control enable datasphere --cloud-id <cloud_ID>
      

      Where --cloud-id is the ID of the cloud you are going to use in the DataSphere community.

    2. Create a service account with the following roles:

      • dataproc.agent to use Yandex Data Processing clusters.
      • dataproc.admin to create clusters from Yandex Data Processing templates.
      • vpc.user to use the Yandex Data Processing cluster network.
      • iam.serviceAccounts.user to create resources in the folder on behalf of the service account.
    3. Under Spark clusters in the community settings, click Add service account and select the service account you created.

Warning

The Yandex Data Processing persistent cluster must have the livy:livy.spark.deploy-mode : client setting.

Creating a Spark connectorCreating a Spark connector

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Under Project resources, click Spark connector.

  3. Click Create connector.

  4. In the Name field, enter a name for your connector. Follow these naming requirements:

    • The name must be from 3 to 63 characters long.
    • It may contain uppercase and lowercase Latin and Cyrillic letters, numbers, hyphens, underscores, and spaces.
    • The first character must be a letter. The last character cannot be a hyphen, underscore, or space.
  5. Under Yandex Data Processing cluster, select the cluster you plan to work with:

    • Select cluster: Select an existing Yandex Data Processing cluster or click Create cluster in Yandex Data Processing to go to Yandex Data Processing and create a new one. The Yandex Data Processing persistent cluster must have the livy:livy.spark.deploy-mode : client setting.
    • Create temporary cluster: Select this option to create a temporary Yandex Data Processing cluster. A temporary cluster will be created the first time you run computations in your project notebook.
  6. Optionally, under S3 settings, specify the static access key ID and the secret storing the secret part of the static key for the S3 connector.

    S3 settings allows you to specify data for connecting to an S3 bucket. We recommend this when working with Object Storage buckets in any clusters, especially in Yandex Data Processing clusters without the HDFS option enabled.

  7. Under Spark settings, select Use default settings to use the default Yandex Data Processing cluster settings or specify the Key and Value parameters to manually add or update the Yandex Data Processing cluster settings.

    Tip

    For Yandex Data Processing clusters without the HDFS option enabled, set spark.hadoop.fs.s3a.fast.upload.buffer = bytebuffer.

    For your own clusters with the Spark Connect connection type, set dataproc:spark-connect = enabled.

  8. Click Create. You will see a page with detailed info on the connector you created.

Using a Yandex Data Processing cluster in a projectUsing a Yandex Data Processing cluster in a project

Warning

If you are using a Spark connector to work with Yandex Data Processing clusters, the notebook will use a special project environment rather than a standard one for running the cluster.

Open the DataSphere project:

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Click Open project in JupyterLab and wait for the loading to complete.
  3. Open the notebook tab.
  1. Run any cell by selecting Run → Run Selected Cells or pressing Shift + Enter.
  2. In the Notebook VM configurations window that opens, go to the With Yandex Data Processing cluster tab.
  3. Select the required configuration and connector.
  4. Click Select.

To disable the Yandex Data Processing temporary cluster created with the Spark connector, stop the notebook VM.

To stop paying for the Yandex Data Processing persistent cluster, delete it using the management console.

Editing a Spark connectorEditing a Spark connector

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Under Project resources, click Spark connector.
  3. In the list of Spark connectors, select the one you want to edit. Click Edit.
  4. Edit the Spark connector by changing its name or parameters.
  5. Click Save.

Sharing a Spark connectorSharing a Spark connector

Note

You can only share resources within a single organization, between communities created in the same availability zone.

To share a Spark connector within a community, you need the Editor role in the project and the Developer role in the community. To learn more about roles in DataSphere, see Access management in DataSphere.

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Under Project resources, click Spark connector.
  3. Select the Spark connector from the list.
  4. Go to the Access tab.
  5. Enable the visibility option next to the name of the community you want to share the Spark connector with.

To make a Spark connector available for use in a different project, the project admin needs to add that connector on the Shared tab.

Deleting a Spark connectorDeleting a Spark connector

You can only delete a connector that is not available to any community.

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Under Project resources, click Spark connector.
  3. In the list of Spark connectors, select the one you want to delete. Click Delete.
  4. Click Confirm.

You will see a message saying that the connector has been deleted.

Warning

In fact, resource deletion can take up to 72 hours.

Was the article helpful?

Previous
Working with S3 connectors
Next
Working with models
© 2025 Direct Cursus Technology L.L.C.