Managing subclusters Yandex Data Processing
In addition to updating the settings of a particular subcluster, you can create new and delete existing subclusters.
Warning
Each cluster may have only 1 subcluster with a master host, which is why you cannot create or delete subclusters with this role. You cannot delete data storage subclusters either.
Getting a list of subclusters in a cluster
- Go to the folder page
and select Yandex Data Processing. - Click the cluster name and open the Subclusters tab.
If you do not have the Yandex Cloud (CLI) command line interface yet, install and initialize it.
The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID>
command. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To request a list of Yandex Data Processing subclusters in a cluster, run the following command:
yc dataproc subcluster list --cluster-name=<cluster_name>
You can retrieve the cluster name with a list of clusters in the folder.
Creating a subcluster
The number of hosts in Yandex Data Processing clusters is limited by quotas
- In the management console
, select the appropriate folder. - Select Yandex Data Processing and the required cluster.
- Go to Subclusters.
- Click Create subcluster.
- Specify the subcluster parameters:
-
Hosts: Select the number of hosts.
-
Roles: Select the subcluster roles depending on the services to be deployed on the hosts:
COMPUTENODE
: Role for processing data. In subclusters with this role, you can deploy YARN NodeManager and Spark libraries.DATANODE
: Role for storing data. In subclusters with this role, you can deploy YARN NodeManager, Spark libraries, HBase RegionServer, and HDFS Datanode.
-
Under Host class, select a platform and computing resources available to the host.
-
Under Size of storage, specify the type and size of storage.
-
Under Network settings:
-
Select Network ID format.
-
Specify Subnet (subnet of the network where the cluster is located).
-
(Optional) Enable Public access for online access to subcluster hosts.
This setting cannot be edited after the subcluster is created.
Tip
You can delete data processing subclusters and recreate them with the relevant setting value.
-
-
(Optional) Enable Autoscaling.
-
- Click Add subcluster.
If you do not have the Yandex Cloud (CLI) command line interface yet, install and initialize it.
The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID>
command. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a subcluster:
-
View a description of the CLI create subcluster command:
yc dataproc subcluster create --help
-
Specify subcluster parameters in the create command (the list of supported parameters in the example is not exhaustive):
yc dataproc subcluster create <subcluster_name> \ --cluster-name=<cluster_name> \ --role=<subcluster_role> \ --resource-preset=<host_class> \ --disk-type=<storage_type> \ --disk-size=<storage_size_in_GB> \ --subnet-name=<subnet_name> \ --hosts-count=<number_of_hosts>
Where:
--cluster-name
: Cluster name. You can retrieve the cluster name with a list of clusters in the folder.--role
: Subcluster role (datanode
, orcomputenode
).--resource-preset
: Host class.--disk-type
: Storage type (network-ssd
,network-hdd
, ornetwork-ssd-nonreplicated
).--disk-size
: Storage size in GB.--subnet-name
: Name of the subnet.--hosts-count
: Subcluster host count. The minimum value is1
and the maximum value is32
.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about how to create this file, see Creating clusters.
-
In the Yandex Data Processing cluster description, add a
subcluster_spec
section containing the settings for the new subcluster:resource "yandex_dataproc_cluster" "data_cluster" { ... cluster_config { ... subcluster_spec { name = "<subcluster_name>" role = "<subcluster_role>" resources { resource_preset_id = "<host_class>" disk_type_id = "<storage_type>" disk_size = <storage_size_in_GB> } subnet_id = "<subnet_ID>" hosts_count = <number_of_subcluster_hosts> ... } } }
Where
role
is the subcluster role,COMPUTENODE
orDATANODE
. -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validate
Terraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform plan
If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply
-
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information about resources you can create using Terraform, see the provider documentation
Deleting a subcluster
Warning
You cannot delete data storage subclusters.
To delete a subcluster:
- In the management console
, select the appropriate folder. - Select Yandex Data Processing and the required cluster.
- Go to Subclusters.
- Click
for the subcluster you need and select Delete. - (Optional) Specify the decommissioning timeout.
- In the window that opens, click Delete.
If you do not have the Yandex Cloud (CLI) command line interface yet, install and initialize it.
The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID>
command. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To delete a subcluster in a Yandex Data Processing cluster, run the command:
yc dataproc subcluster delete <subcluster_name_or_ID> \
--cluster-name=<cluster_name>
You can request a subcluster name or ID with a list of cluster subclusters, and a cluster name with a list of folder clusters.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about how to create this file, see Creating clusters.
-
In the Yandex Data Processing cluster description, delete the
subcluster_spec
section for the required subcluster. -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validate
Terraform will show any errors found in your configuration files.
-
-
Type
yes
and press Enter.-
Run this command to view the planned changes:
terraform plan
If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply
-
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information about resources you can create using Terraform, see the provider documentation