Managing subclusters Yandex Data Processing
In addition to updating the settings of a particular subcluster, you can create new and delete existing subclusters.
Warning
Each cluster may only have one subcluster with a master host, which is why you cannot create or delete subclusters with this role. You cannot delete data storage subclusters either.
Getting a list of subclusters in a cluster
- Open the folder dashboard
. - Go to Yandex Data Processing.
- Click the name of your cluster and select the Subclusters tab.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To request a list of Yandex Data Processing subclusters in a cluster, run the following command:
yc dataproc subcluster list --cluster-name=<cluster_name>
You can get the cluster name with the list of clusters in the folder.
Creating a subcluster
The number of hosts in Yandex Data Processing clusters is limited by quotas
- In the management console
, select the folder you need. - Go to Yandex Data Processing and select the cluster.
- Go to Subclusters.
- Click Create subcluster.
- Specify the subcluster settings:
-
Hosts: Select the number of hosts.
-
Roles: Select the subcluster roles depending on the services to deploy on the hosts:
COMPUTENODE: Role for processing data. In subclusters with this role, you can deploy YARN NodeManager and Spark libraries.DATANODE: Role for storing data. In subclusters with this role, you can deploy YARN NodeManager, Spark libraries, HBase RegionServer, and HDFS Datanode.
-
Under Host class, select a platform and computing resources available to the host.
-
Under Storage size, specify the storage type and size.
-
Under Network settings:
-
Select Network ID format.
-
Specify Subnet (subnet of the network hosting the cluster).
-
Optionally, enable Public access for online access to subcluster hosts.
You will not be able to edit this setting after creating the subcluster.
Tip
You can delete data processing subclusters and recreate them with the required configuration.
-
-
Optionally, enable Autoscaling.
-
- Click Add subcluster.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To create a subcluster:
-
View the description of the CLI command for creating a subcluster:
yc dataproc subcluster create --help -
Specify the subcluster settings in the create command properties in this command (the example does not show all that are available):
yc dataproc subcluster create <subcluster_name> \ --cluster-name=<cluster_name> \ --role=<subcluster_role> \ --resource-preset=<host_class> \ --disk-type=<storage_type> \ --disk-size=<storage_size_in_GB> \ --subnet-name=<subnet_name> \ --hosts-count=<number_of_hosts>Where:
--cluster-name: Cluster name. You can get the cluster name with the list of clusters in the folder.--role: subcluster role, which can be eitherdatanodeorcomputenode.--resource-preset: Host class.--disk-type: Storage type, which can benetwork-ssd,network-hdd, ornetwork-ssd-nonreplicated.--disk-size: Storage size in GB.--subnet-name: Subnet name.--hosts-count: Subcluster host count. The minimum value is1, and the maximum value is32.
-
Open the current Terraform configuration file describing your infrastructure.
To learn how to create this file, see Creating a cluster.
-
In the Yandex Data Processing cluster description, add a
subcluster_specsection containing the settings for the new subcluster:resource "yandex_dataproc_cluster" "data_cluster" { ... cluster_config { ... subcluster_spec { name = "<subcluster_name>" role = "<subcluster_role>" resources { resource_preset_id = "<host_class>" disk_type_id = "<storage_type>" disk_size = <storage_size_in_GB> } subnet_id = "<subnet_ID>" hosts_count = <number_of_hosts_in_subcluster> ... } } }Where
roleis the subcluster role:COMPUTENODEorDATANODE. -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
To learn more about resources you can create with Terraform, see this provider guide.
Deleting a subcluster
Warning
You cannot delete data storage subclusters.
To delete a subcluster:
- In the management console
, select the folder you need. - Go to Yandex Data Processing and select the cluster.
- Go to Subclusters.
- Click
for the subcluster you need and select Delete. - Optionally, specify the decommissioning timeout.
- In the window that opens, click Delete.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To delete a subcluster in a Yandex Data Processing cluster, run this command:
yc dataproc subcluster delete <subcluster_name_or_ID> \
--cluster-name=<cluster_name>
You can request the subcluster name or ID with the list of cluster subclusters, and the cluster name, with the list of folder clusters.
-
Open the current Terraform configuration file describing your infrastructure.
To learn how to create this file, see Creating a cluster.
-
Delete the
subcluster_specsection of the relevant subcluster from the Yandex Data Processing cluster description. -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Type
yesand press Enter.-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
To learn more about resources you can create with Terraform, see this provider guide.