Updating an Apache Spark™ cluster

Written by

Yandex Cloud

Updated at October 20, 2025

After creating a cluster, you can edit its basic and advanced settings.

Management console

Terraform

gRPC API

To change the cluster settings:

Navigate to the folder page and select Managed Service for Apache Spark™.
Select the cluster and click Edit in the top panel.
Under Basic parameters:
- Edit the cluster name and description.
- Delete or add new labels.
- Select a service account or create a new one with the managed-spark.integrationProvider role. The cluster will thus get the permissions it needs to work with other resources.
Under Network settings, select a security group for cluster network traffic.
Under Driver configuration and Executor configuration, specify the number of instances and computing resource configuration. The number of instances can be either fixed or autoscalable.
Under Advanced settings:
1. Delete or add names of pip and deb packages.
  
  The package name format and version are defined by the install command: pip install for pip packages and apt install for deb packages.
2. In the Maintenance window setting, update the cluster maintenance time:
  - To enable maintenance at any time, select arbitrary (default).
  - To specify the preferred maintenance start time, select by schedule and specify the desired day of the week and UTC hour. For example, you can choose a time when the cluster is least loaded.
  Maintenance operations are carried out both on enabled and disabled clusters. They may include updating the DBMS, applying patches, and so on.
3. Select a Apache Hive™ Metastore cluster to connect as a metadata storage.
4. Enable or disable cluster deletion protection.
5. Enable or disable History Server. This option allows using the service to monitor Spark History Server applications.
6. Enable or disable Write logs. This option enables logging of Spark applications in the cluster:
  1. Select the log destination:
    - Folder: Select a folder from the list.
    - Group: Select a log group from the list or create a new one.
  2. Select Min. logging level from the list.
Click Save.

To change the cluster settings:

Open the current Terraform configuration file that defines your infrastructure.

For more information about creating this file, see Creating clusters.
To change cluster settings, change the required field values in the configuration file.

Alert

Do not change the cluster name using Terraform. This will delete the existing cluster and create a new one.

Here is an example of the configuration file structure:
```
resource "yandex_spark_cluster" "my_spark_cluster" {
  description         = "<cluster_description>"
  name                = "my-spark-cluster"
  folder_id           = "<folder_ID>"
  service_account_id  = "<service_account_ID>"
  deletion_protection = <protect_cluster_from_deletion>

  labels = {
    <label_list>
  }

  network = {
    subnet_ids         = ["<list_of_subnet_IDs>"]
    security_group_ids = ["<list_of_security_group_IDs>"]
  }

  config = {
    resource_pools = {
      driver = {
        resource_preset_id = "<host_class>"
        size               = <fixed_number_of_instances>
      }
      executor = {
        resource_preset_id = "<host_class>"
        size               = <fixed_number_of_instances>
      }
    }
    history_server = {
      enabled = <use_of_Apache_Spark_History_Server>
    } 
    metastore = {
      cluster_id = "<Apache_Hive™_Metastore_cluster_ID>"
    }
    dependencies = {
      deb_packages = ["<list_of_deb_packages>"]
      pip_packages = ["<list_of_pip_packages>"]
    }
  }

  maintenance_window = {
    type = "<maintenance_type>"
    day  = "<day_of_week>"
    hour = "<hour>"
  }

  logging = {
    enabled      = <enable_logging>
    folder_id    = "<folder_ID>"
  }

}
```
Where:
- description: Cluster description.
- service_account_id: Service account ID.
- deletion_protection: Cluster protection from accidental deletion, true or false.
- labels: List of labels. Provide labels in <key> = "<value>" format.
- security_group_ids: List of security group IDs.
- driver: Host configuration to run Apache Spark™ drivers. In this section, specify:
  - Host class in the resource_preset_id parameter.
  - Number of instances. Specify a fixed number in the size parameter or the minimum and maximum number for autoscaling in the min_size and max_size parameters.
- executor: Host configuration to run Apache Spark™ executors. In this section, specify:
  - Host class in the resource_preset_id parameter.
  - Number of instances. Specify a fixed number in the size parameter or the minimum and maximum number for autoscaling in the min_size and max_size parameters.
- maintenance_window: Maintenance window settings (including for disabled clusters). In this section, specify:
  - Maintenance type in the type parameter. The possible values include:
    - ANYTIME: Any time.
    - WEEKLY: On a schedule.
  - Day of week for the WEEKLY maintenance type in the day parameter, i.e., MON, TUE, WED, THU, FRI, SAT, or SUN.
  - UTC hour for the WEEKLY maintenance type in the hour parameter, from 1 to 24.
- history_server: Connecting Apache Spark™ History Server. To use the service, set the enabled parameter to true.
- metastore: Connecting a Apache Hive™ Metastore metadata storage. Specify a Apache Hive™ Metastore cluster ID in the cluster_id parameter.
- dependencies: Additional deb and pip packages for running Apache Spark™ jobs. In this section, specify:
  - deb_packages: Names of deb packages. Their format depends on the apt install installation command.
  - pip_packages: Names of pip packages. Their format depends on the pip install installation command.
- logging: Logging parameters. Logs generated by Apache Spark™ components will be sent to Yandex Cloud Logging. To enable logging:
  - Set the enabled = true value.
  - Specify one of two log storage locations:
    - folder_id: Folder ID. Logs will be written to the default log group for this folder.
    - log_group_id: Custom log group ID. Logs will be written to this group.
Validate your configuration.
1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
2. Run this command:
```
terraform validate
```
  Terraform will show any errors found in your configuration files.
Confirm updating the resources.
1. Run this command to view the planned changes:
```
terraform plan
```
  If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
2. If everything looks correct, apply the changes:
  1. Run this command:
```
terraform apply
```
  2. Confirm updating the resources.
  3. Wait for the operation to complete.

For more information, see this Terraform provider article.

To change the cluster settings:

Get an IAM token for API authentication and save it as an environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Clone the cloudapi repository:
```
cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
```
Below, we assume the repository contents are stored in the ~/cloudapi/ directory.
Create a file named body.json and paste the following code into it:
```
{
  "cluster_id": "<cluster_ID>",
  "update_mask": "<list_of_parameters_to_update>",
  "name": "<cluster_name>",
  "description": "<cluster_description>",
  "labels": { <label_list> },
  "config_spec": {
   "resource_pools": {
     "driver": {
       "resource_preset_id": "<driver_resource_ID>",
       "scale_policy": {
         "fixed_scale": {
           "size": "<number_of_driver_instances>"
         }
       }
     },
     "executor": {
       "resource_preset_id": "<executor_resource_ID>",
       "scale_policy": {
         "auto_scale": {
           "min_size": "<minimum_number_of_executor_instances>",
           "max_size": "<maximum_number_of_executor_instances>"
         }
       }
     }
   },
   "history_server": {
     "enabled": <use_of_Apache_Spark_History_Server>
   },
    "dependencies": {
      "pip_packages": [ <list_of_pip_packages> ],
      "deb_packages": [ <list_of_deb_packages> ]
    },
    "metastore": {
      "cluster_id": "<Apache_Hive™_Metastore_cluster_ID>"
    }
  },
  "network_spec": {
    "security_group_ids": [ <list_of_security_group_IDs> ]
  },
  "deletion_protection": <deletion_protection>,
  "service_account_id": "<service_account_ID>",
  "logging": {
    "enabled": <use_of_logging>,
    "log_group_id": "<log_group_ID>",
    "folder_id": "<folder_ID>"
  }
}
```
Where:
- cluster_id: Cluster ID. You can get it with the list of clusters in a folder.
- update_mask: List of parameters to update as an array of paths[] strings.
  Format for listing settings
  "update_mask": { "paths": [ "<setting_1>", "<setting_2>", ... "<setting_N>" ] }
  Warning
  
  When you update a cluster, all its parameters will reset to their defaults unless explicitly provided in the request. To avoid this, list the settings you want to change in the update_mask parameter.
- name: Cluster name.
- description: Cluster description.
- labels: List of labels. Provide labels in "<key>": "<value>" format.
- config_spec: Cluster configuration:
  - resource_pools: Resource pool configuration:
    - driver: Host configuration to run Apache Spark™ drivers.
      - resource_preset_id: Driver host class.
      - scale_policy: Host group scaling policy for the driver:
        
        fixed_scale: Fixed scaling policy.
        
        size: Number of driver hosts.
        
        auto_scale: Automatic scaling policy.
        
        min_size: Minimum number of driver hosts.
        
        max_size: Maximum number of driver hosts.
        
        Specify either fixed_scale or auto_scale.
    - executor: Host configuration to run Apache Spark™ executors.
      - resource_preset_id: Executor host class.
      - scale_policy: Host group scaling policy for the executor:
        
        fixed_scale: Fixed scaling policy.
        
        size: Number of executor hosts.
        
        auto_scale: Automatic scaling policy.
        
        min_size: Minimum number of executor hosts.
        
        max_size: Maximum number of executor hosts.
        
        Specify either fixed_scale or auto_scale.
    - history_server: History server parameters.
      - enabled: Flag to enable history server. It allows using the service to monitor Spark History Server applications.
    - dependencies: Lists of packages enabling you to install additional libraries and applications on the cluster.
      - pip_packages: List of pip packages.
      - deb_packages: List of deb packages.
      You can set version restrictions for the installed packages, e.g.:
      "dependencies": { "pip_packages": [ "pandas==2.0.2", "scikit-learn>=1.0.0", "clickhouse-driver~=0.2.0" ] }
      The package name format and version are defined by the install command: pip install for pip packages and apt install for deb packages.
    - metastore: Parameters of the cluster’s metadata storage.
      - cluster_id: Apache Hive™ Metastore cluster ID.
  - network: Network settings:
    - security_group_ids: List of security group IDs.
  - deletion_protection: Enables cluster protection against accidental deletion. The possible values are: true or false.
    
    Even if it is enabled, one can still connect to the cluster manually and delete it.
  - service_account_id: ID of the service account for access to Yandex Cloud services. Make sure to assign the managed-spark.integrationProvider role to this service account:
  - logging: Logging parameters:
    - enabled: Enables logging. The possible values are: true or false. Logs generated by Apache Spark™ components will be sent to Yandex Cloud Logging. The possible values are: true or false.
    - folder_id: Folder ID. Logs will be written to the default log group for this folder.
    - log_group_id: Custom log group ID. Logs will be written to this group.
    Specify either folder_id or log_group_id.

Use the ClusterService.Update call and send the following request, e.g., via gRPCurl:

grpcurl \
    -format json \
    -import-path ~/cloudapi/ \
    -import-path ~/cloudapi/third_party/googleapis/ \
    -proto ~/cloudapi/yandex/cloud/spark/v1/cluster_service.proto \
    -rpc-header "Authorization: Bearer $IAM_TOKEN" \
    -d @ \
    spark.api.cloud.yandex.net:443 \
    yandex.cloud.spark.v1.ClusterService.Update \
    < body.json

View the server response to make sure your request was successful.

Updating an Apache Spark™ cluster

Was the article helpful?