Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Managed Service for Apache Spark™
  • Getting started
    • All guides
      • Information about existing clusters
      • Creating a cluster
      • Connecting to a cluster
      • Updating a cluster
      • Apache Spark™ version upgrade
      • Stopping and starting a cluster
      • Deleting a cluster
  • Access management
  • Pricing policy
  • Yandex Monitoring metrics
  • Audit Trails events
  • Terraform reference
  • Release notes

In this article:

  • Roles for creating a cluster
  • Creating a cluster
  • Examples
  1. Step-by-step guides
  2. Clusters
  3. Creating a cluster

Creating a Apache Spark™ cluster

Written by
Yandex Cloud
Updated at February 13, 2026
  • Roles for creating a cluster
  • Creating a cluster
  • Examples

Each Apache Spark™ cluster contains computing resources to run Spark applications.

Roles for creating a clusterRoles for creating a cluster

To create a Apache Spark™ cluster, your Yandex Cloud account needs the following roles:

  • managed-spark.admin: To create a cluster.
  • vpc.user: To use the cluster network.
  • iam.serviceAccounts.user: To assign a service account to a cluster.

Make sure to assign the managed-spark.integrationProvider and storage.editor roles to the cluster service account. The cluster will thus get the permissions it needs to work with user resources.

For more information about assigning roles, see this Yandex Identity and Access Management guide.

Creating a clusterCreating a cluster

Management console
CLI
Terraform
gRPC API
  1. In the management console, select the folder where you want to create a Apache Spark™ cluster.

  2. Go to Managed Service for Apache Spark™.

  3. Click Create cluster.

  4. Under Basic parameters:

    1. Give the cluster a name. The name must be unique within the folder.

    2. Optionally, enter a description for the cluster.

    3. Optionally, create labels:

      1. Click Add label.
      2. Enter a label in key: value format.
      3. Press Enter.
    4. Select an existing service account or create a new one.

      Make sure to assign the managed-spark.integrationProvider role to this service account:

    5. Select the Apache Spark™ version.

      Note

      After creating a cluster, you can change your Apache Spark™ version. You can only upgrade the version.

  5. Under Network settings, select a network, subnet, and security group for the cluster.

  6. Specify the computing resources to run Spark applications on:

    • Driver configuration: Number of driver hosts and their class. It can be either fixed or autoscalable.
    • Executor configuration: Number of executor hosts and their class. It can be either fixed or autoscalable.
  7. If needed, configure advanced cluster settings:

    1. Pip packages and Deb packages: Pip and deb package names for installing additional libraries and applications.

      To specify multiples packages, click Add.

      The package name format and version are defined by the install command: pip install for pip packages and apt install for deb packages.

    2. Maintenance window: Maintenance window settings:

      • To enable maintenance at any time, select arbitrary (default).
      • To specify the preferred maintenance start time, select by schedule and specify the desired day of the week and UTC hour. For example, you can choose a time when the cluster is least loaded.

      Maintenance operations are carried out both on enabled and disabled clusters. They may include updating the DBMS, applying patches, and so on.

    3. Metastore: Metastore server connected to your cluster.

    4. Deletion protection: Manages cluster protection against accidental deletion.

    5. Enable the History Server setting to allow using the service to monitor Spark History Server applications. After creating a cluster, the service will be available via a link.

    6. Configure logging:

      1. Enable the Write logs setting.
      2. Select the log destination:
        • Folder: Select a folder from the list.
        • Group: Select a log group from the list or create a new one.
      3. Select Min. logging level from the list.
  8. Click Create.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

To create an Apache Spark™ cluster:

  1. Verify that your folder has subnets for cluster host placement:

    yc vpc subnet list
    

    If your folder contains no subnets, create them in VPC.

  2. View the description of the CLI command for creating a cluster:

    yc managed-spark cluster create --help
    
  3. Specify the cluster properties in this command (the example does not show all that are available):

    yc managed-spark cluster create \
       --name <cluster_name> \
       --spark-version <<Apache_Spark_version>> \
       --service-account-id <service_account_ID> \
       --subnet-ids <list_of_subnet_IDs> \
       --security-group-ids <list_of_security_group_IDs> \
       --driver-preset-id=<class_of_driver_computing_resources> \
       --driver-fixed-size <number_of_driver_hosts> \
       --executor-preset-id <class_of_executor_computing_resources> \
       --executor-min-size <minimum_number_of_executor_hosts> \
       --executor-max-size <maximum_number_of_executor_hosts> \
       --pip-packages <list_of_pip_packages> \
       --deb-packages <list_of_deb_packages>\
       --history-server-enabled \ 
       --metastore-cluster-id <cluster_ID> \
       --deletion-protection
    

    Where:

    • --name: Cluster name. It must be unique within the folder.

    • --version: Apache Spark™ version.

      Note

      After creating a cluster, you can change your Apache Spark™ version. You can only upgrade the version.

    • --service-account-id: Service account ID.

    • --subnet-ids: List of subnet IDs.

    • --security-group-ids: List of security group IDs.

    • --driver-preset-id: Class of driver computing resources.

    • --driver-fixed-size: Fixed number of driver instances.

    • --driver-min-size: Minimum number of driver instances.

    • --driver-max-size: Maximum number of driver instances.

      Specify either a fixed number of drivers or minimum and maximum numbers of drivers for autoscaling.

    • --executor-preset-id: Class of executor computing resources.

    • --executor-fixed-size: Fixed number of executor instances.

    • --executor-min-size: Minimum number of executor instances.

    • --executor-max-size: Maximum number of executor instances.

      Specify either a fixed number of executors or minimum and maximum numbers of executors for autoscaling.

    • --pip-packages: List of pip packages enabling you to install additional libraries and applications in the cluster.

      The package name format and version are determined by the pip install command.

    • --deb-packages: List of deb packages enabling you to install additional libraries and applications in the cluster.

      The package name format and version are determined by the apt install command.

    • --history-server-enabled: Flag to enable history server. It allows using the service to monitor Spark History Server applications.

    • --metastore-cluster-id: ID of the Apache Hive™ Metastore cluster to use as a metadata storage.

    • --deletion-protection: Cluster protection from accidental deletion, true or false.

      Even with deletion protection on, one can still connect to the cluster manually and delete it.

  4. To enable sending of Apache Spark™ logs to Yandex Cloud Logging, specify logging parameters:

    yc managed-spark cluster create <cluster_name> \
       ...
       --log-enabled \
       --log-folder-id <folder_ID>
    

    Where:

    • --log-enabled: Enables logging.

    • --log-folder-id: Folder ID. Logs will be written to the default log group for this folder.

    • --log-group-id: Custom log group ID. Logs will be written to this group.

      Specify either a folder ID or a custom log group ID.

  5. To set up a maintenance window (including for disabled clusters), provide the required value in the --maintenance-window parameter:

    yc managed-spark cluster create <cluster_name> \
       ...
       --maintenance-window type=<maintenance_type>,`
                           `day=<day_of_week>,`
                           `hour=<hour> \
    

    Where type is the maintenance type:

    • anytime: At any time (default).
    • weekly: On a schedule. For this value, also specify the following:
      • day: Day of week, i.e., MON, TUE, WED, THU, FRI, SAT, or SUN.
      • hour: Hour of day (UTC), from 1 to 24.

With Terraform, you can quickly create a cloud infrastructure in Yandex Cloud and manage it using configuration files. These files store the infrastructure description written in HashiCorp Configuration Language (HCL). If you change the configuration files, Terraform automatically detects which part of your configuration is already deployed, and what should be added or removed.

Terraform is distributed under the Business Source License. The Yandex Cloud provider for Terraform is distributed under the MPL-2.0 license.

For more information about the provider resources, see the relevant documentation on the Terraform website or its mirror.

If you do not have Terraform yet, install it and configure the Yandex Cloud provider.

To create a Yandex Managed Service for Apache Spark™ cluster:

  1. In the configuration file, describe the resources you are creating:

    • Yandex Managed Service for Apache Spark™ cluster: Cluster description.

    • Network: Description of the cloud network where a cluster will be located. If you already have a suitable network, you don't have to describe it again.

    • Subnets: Description of the subnets to connect the cluster hosts to. If you already have suitable subnets, you don't have to describe them again.

    Here is an example of the configuration file structure:

    resource "yandex_spark_cluster" "<cluster_name>" {
      description         = "<cluster_description>"
      name                = "<cluster_name>"
      folder_id           = "<folder_ID>"
      service_account_id  = "<service_account_ID>"
      deletion_protection = <protect_cluster_from_deletion>
    
      labels = {
        <label_list>
      }
    
      network = {
        subnet_ids         = ["<list_of_subnet_IDs>"]
        security_group_ids = ["<list_of_security_group_IDs>"]
      }
    
      config = {
        resource_pools = {
          driver = {
            resource_preset_id = "<host_class>"
            size               = <fixed_number_of_instances>
          }
          executor = {
            resource_preset_id = "<host_class>"
            size               = <fixed_number_of_instances>
          }
        }
        spark_version = "<Apache_Spark_version>"
      }
    
      logging = {
        enabled      = <enable_logging>
        folder_id    = "<folder_ID>"
      }
    
    }
    
    resource "yandex_vpc_network" "<network_name>" {
      name = "<network_name>"
    }
    
    resource "yandex_vpc_subnet" "<subnet_name>" {
      name           = "<subnet_name>"
      zone           = "<availability_zone>"
      network_id     = "<network_ID>"
      v4_cidr_blocks = ["<range>"]
    }
    

    Where:

    • description: Cluster description. This is an optional parameter.

    • name: Cluster name.

    • folder_id: Folder ID. This is an optional parameter. If the value is missing, the cluster will reside in the folder specified in the provider settings.

    • service_account_id: Service account ID.

    • deletion_protection: Cluster protection from accidental deletion, true or false. This is an optional parameter.

    • labels: List of labels. This is an optional parameter. Provide labels in <key> = "<value>" format.

    • subnet_ids: List of subnet IDs.

    • security_group_ids: List of security group IDs.

    • driver: Host configuration to run Apache Spark™ drivers. In this section, specify:

      • Host class in the resource_preset_id parameter.
      • Number of instances. Specify a fixed number in the size parameter or the minimum and maximum number for autoscaling in the min_size and max_size parameters.
    • executor: Host configuration to run Apache Spark™ executors. In this section, specify:

      • Host class in the resource_preset_id parameter.
      • Number of instances. Specify a fixed number in the size parameter or the minimum and maximum number for autoscaling in the min_size and max_size parameters.
    • spark_version (optional): Apache Spark™ version.

      Note

      After creating a cluster, you can change your Apache Spark™ version. You can only upgrade the version.

    • logging: Logging parameters. Logs generated by Apache Spark™ components will be sent to Yandex Cloud Logging. To enable logging:

      • Set the enabled = true value.

      • Specify one of two log storage locations:

        • folder_id: Folder ID. Logs will be written to the default log group for this folder.
        • log_group_id: Custom log group ID. Logs will be written to this group.
  2. If necessary, configure additional DBMS settings:

    • To set up the maintenance window (for disabled clusters as well), add the maintenance_window section to the cluster description:

      resource "yandex_spark_cluster" "<cluster_name>" {
        ...
        maintenance_window {
          type = <maintenance_type>
          day  = <day_of_week>
          hour = <hour>
        }
        ...
      }
      

      Where:

      • type: Maintenance type. The possible values include:
        • ANYTIME: Anytime
        • WEEKLY: On a schedule
      • day: Day of week for the WEEKLY type, i.e., MON, TUE, WED, THU, FRI, SAT, or SUN.
      • hour: UTC hour for the WEEKLY type, from 1 to 24.
    • To enable Apache Spark™ History Server, add a section named history_server to the cluster configuration description:

      resource "yandex_spark_cluster" "<cluster_name>" {
      ...
        config = {
        ...
          history_server = {
            enabled = true
          }
        }
      }
      
    • To connect an Apache Hive™ Metastore server to your cluster, add the metastore section to the cluster configuration description:

      resource "yandex_spark_cluster" "<cluster_name>" {
      ...
        config = {
        ...
          metastore = {
            cluster_id = "<metastore_cluster_ID>"
          }
        }
      }
      
    • To connect additional deb and pip packages for running Apache Spark™ jobs, add the dependencies section to the cluster configuration description:

      resource "yandex_spark_cluster" "<cluster_name>" {
      ...
        config = {
        ...
          dependencies = {
            deb_packages = ["<list_of_deb_packages>"]
            pip_packages = ["<list_of_pip_packages>"]
          }
        }
      }
      

      Where deb_packages and pip_packages are package names. Their format depends on the installation command: apt install for deb packages and pip install for pip packages.

  3. Validate your configuration.

    1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.

    2. Run this command:

      terraform validate
      

      Terraform will show any errors found in your configuration files.

  4. Create a Yandex Managed Service for Apache Spark™ cluster.

    1. Run this command to view the planned changes:

      terraform plan
      

      If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.

    2. If everything looks correct, apply the changes:

      1. Run this command:

        terraform apply
        
      2. Confirm updating the resources.

      3. Wait for the operation to complete.

    This will create all the resources you need in the specified folder. You can check the new resources and their settings in the management console.

For more information, see this Terraform provider guide.

  1. Get an IAM token for API authentication and place it in an environment variable:

    export IAM_TOKEN="<IAM_token>"
    
  2. Clone the cloudapi repository:

    cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
    

    Below, we assume that the repository contents reside in the ~/cloudapi/ directory.

  3. Create a file named body.json and paste the following code into it:

    {
      "folder_id": "<folder_ID>",
      "name": "<cluster_name>",
      "description": "<cluster_description>",
      "labels": { <label_list> },
      "config": {
        "resource_pools": {
          "driver": {
            "resource_preset_id": "<resource_ID>",
            "scale_policy": {
              "fixed_scale": {
                "size": "<number_of_instances>"
              }
            }
          },
          "executor": {
            "resource_preset_id": "<resource_ID>",
            "scale_policy": {
              "auto_scale": {
                "min_size": "<minimum_number_of_instances>",
                "max_size": "<maximum_number_of_instances>"
              }
            }
          },
          "spark_version": "<Apache_Spark_version>"
        },
        "history_server": {
          "enabled": <use_of_Apache_Spark_History_Server>
        },
        "dependencies": {
          "pip_packages": [ <list_of_pip_packages> ],
          "deb_packages": [ <list_of_deb_packages> ]
        },
        "metastore": {
          "cluster_id": "<cluster_ID>"
        }
      },
      "network": {
        "subnet_ids": [ <list_of_subnet_IDs> ],
        "security_group_ids": [ <list_of_security_group_IDs> ]
      },
      "deletion_protection": <deletion_protection>,
      "service_account_id": "<service_account_ID>",
      "logging": {
        "enabled": <use_of_logging>,
        "folder_id": "<folder_ID>"
      }
    }
    

    Where:

    • folder_id: Folder ID. You can request it with the list of folders in the cloud.

    • name: Cluster name.

    • description: Cluster description.

    • labels: List of labels provided in "<key>": "<value>" format.

    • config: Cluster configuration:

      • resource_pools: Resource pool configuration:

        • driver: Host configuration to run Apache Spark™ drivers.

          • resource_preset_id: Driver host class.

          • scale_policy: Host group scaling policy for the driver:

            • fixed_scale: Fixed scaling policy.

              • size: Number of driver hosts.
            • auto_scale: Automatic scaling policy.

              • min_size: Minimum number of driver hosts.
              • max_size: Maximum number of driver hosts.

              Specify either fixed_scale or auto_scale.

        • executor: Host configuration to run Apache Spark™ executors.

          • resource_preset_id: Executor host class.

          • scale_policy: Host group scaling policy for the executor:

            • fixed_scale: Fixed scaling policy.

              • size: Number of executor hosts.
            • auto_scale: Automatic scaling policy.

              • min_size: Minimum number of executor hosts.
              • max_size: Maximum number of executor hosts.

              Specify either fixed_scale or auto_scale.

      • history_server: History server parameters.

        • enabled: Flag to enable history server. It allows using the service to monitor Spark History Server applications.
      • dependencies: Lists of packages enabling you to install additional libraries and applications on the cluster.

        • pip_packages: List of pip packages.
        • deb_packages: List of deb packages.

        The package name format and version are defined by the install command: pip install for pip packages and apt install for deb packages.

      • metastore: Metastore parameters.

        • cluster_id: Apache Hive™ Metastore cluster ID.
      • spark_version: Apache Spark™ version.

        Note

        After creating a cluster, you can change your Apache Spark™ version. You can only upgrade the version.

    • network: Network settings:

      • subnet_ids: List of subnet IDs.
      • security_group_ids: List of security group IDs.
    • deletion_protection: Enables cluster protection against accidental deletion. The possible values are true or false.

      Even with deletion protection on, one can still connect to the cluster manually and delete it.

    • service_account_id: Service account ID.

    • logging: Logging parameters:

      • enabled: Enables logging. Logs generated by Spark applications will go to Yandex Cloud Logging. The possible values are true or false.
      • folder_id: Folder ID. Logs will be written to the default log group for this folder.
      • log_group_id: Custom log group ID. Logs will be written to this group.

      Specify either folder_id or log_group_id.

  4. Call the ClusterService/Create method, e.g., via the following gRPCurl request:

    grpcurl \
        -format json \
        -import-path ~/cloudapi/ \
        -import-path ~/cloudapi/third_party/googleapis/ \
        -proto ~/cloudapi/yandex/cloud/spark/v1/cluster_service.proto \
        -rpc-header "Authorization: Bearer $IAM_TOKEN" \
        -d @ \
        spark.api.cloud.yandex.net:443 \
        yandex.cloud.spark.v1.ClusterService.Create \
        < body.json
    
  5. Check the server response to make sure your request was successful.

ExamplesExamples

CLI
Terraform

Create an Apache Spark™ cluster with the following test specifications:

  • Name: myspark.
  • Service account: ajev56jp96ji********.
  • Subnet: b0rcctk2rvtr8efcch64.
  • Security group: enp6saqnq4ie244g67sb.
  • Two drivers with computing resource class: c2-m16.
  • Four executors with computing resource class: c2-m16.
  • History server enabled.
  • Accidental deletion protection enabled.

Run this command:

yc managed-spark cluster create \
   --name myspark \
   --service-account-id ajev56jp96ji******** \
   --subnet-ids b0rcctk2rvtr8efcch64 \
   --security-group-ids enp6saqnq4ie244g67sb \
   --driver-preset-id c2-m16 \
   --driver-fixed-size 2 \
   --executor-preset-id c2-m16 \
   --executor-fixed-size 4 \
   --history-server-enabled \
   --deletion-protection

Create an Apache Spark™ cluster and its supporting network, using the following test specifications:

  • Name: myspark.
  • Service account: ajev56jp96ji********.
  • Network: msp-network.
  • Subnet: msp-subnet. The subnet availability zone is ru-central1-a; the range is 10.1.0.0/16.
  • Two drivers with computing resource class: c2-m16.
  • Four executors with computing resource class: c2-m16.
  • History server enabled.
  • Accidental deletion protection enabled.
  • Logging disabled.

The configuration file for this cluster is as follows:

resource "yandex_spark_cluster" "myspark" {
  name                = "myspark"
  service_account_id  = "ajev56jp96ji********"
  deletion_protection = true

  network = {
    subnet_ids = [yandex_vpc_subnet.msp-subnet.id]
  }

  config = {
    resource_pools = {
      driver = {
        resource_preset_id = "c2-m16"
        size               = 2
      }
      executor = {
        resource_preset_id = "c2-m16"
        size               = 4
      }
    }
    history_server = {
      enabled = true
    }
  }

  logging = {
    enabled = false
  }
}

resource "yandex_vpc_network" "msp-network" {
  name = "msp-network"
}

resource "yandex_vpc_subnet" "msp-subnet" {
  name           = "msp-subnet"
  zone           = "ru-central1-a"
  network_id     = yandex_vpc_network.msp-network.id
  v4_cidr_blocks = ["10.1.0.0/16"]
}

Was the article helpful?

Previous
Information about existing clusters
Next
Connecting to a cluster
© 2026 Direct Cursus Technology L.L.C.