Managing subclusters Yandex Data Processing
In addition to updating the settings of a particular subcluster, you can create new and delete existing subclusters.
Warning
Each cluster may have only 1 subcluster with a master host, which is why you cannot create or delete subclusters with this role. You cannot delete data storage subclusters either.
Getting a list of subclusters in a cluster
- Go to the folder page
and select Yandex Data Processing. - Click the cluster name and open the Subclusters tab.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To request a list of Yandex Data Processing subclusters in a cluster, run the following command:
yc dataproc subcluster list --cluster-name=<cluster_name>
You can retrieve the cluster name with a list of clusters in the folder.
Creating a subcluster
The number of hosts in Yandex Data Processing clusters is limited by quotas
- In the management console
, select the appropriate folder. - Select Yandex Data Processing and the required cluster.
- Go to Subclusters.
- Click Create subcluster.
- Specify the subcluster parameters:
-
Hosts: Select the number of hosts.
-
Roles: Select the subcluster roles depending on the services to be deployed on the hosts:
COMPUTENODE
: Role for processing data. In subclusters with this role, you can deploy YARN NodeManager and Spark libraries.DATANODE
: Role for storing data. In subclusters with this role, you can deploy YARN NodeManager, Spark libraries, HBase RegionServer, and HDFS Datanode.
-
Under Host class, select a platform and computing resources available to the host.
-
Under Size of storage, specify the type and size of storage.
-
Under Network settings:
-
Select Network ID format.
-
Specify Subnet (subnet of the network where the cluster is located).
-
(Optional) Enable Public access for online access to subcluster hosts.
This setting cannot be edited after the subcluster is created.
Tip
You can delete data processing subclusters and recreate them with the relevant setting value.
-
-
(Optional) Enable Autoscaling.
-
- Click Add subcluster.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a subcluster:
-
View a description of the CLI create subcluster command:
yc dataproc subcluster create --help
-
Specify subcluster parameters in the create command (the list of supported parameters in the example is not exhaustive):
yc dataproc subcluster create <subcluster_name> \ --cluster-name=<cluster_name> \ --role=<subcluster_role> \ --resource-preset=<host_class> \ --disk-type=<storage_type> \ --disk-size=<storage_size_in_GB> \ --subnet-name=<subnet_name> \ --hosts-count=<number_of_hosts>
Where:
--cluster-name
: Cluster name. You can retrieve the cluster name with a list of clusters in the folder.--role
: Subcluster role (datanode
, orcomputenode
).--resource-preset
: Host class.--disk-type
: Storage type (network-ssd
,network-hdd
, ornetwork-ssd-nonreplicated
).--disk-size
: Storage size in GB.--subnet-name
: Name of the subnet.--hosts-count
: Subcluster host count. The minimum value is1
and the maximum value is32
.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about how to create this file, see Creating clusters.
-
In the Yandex Data Processing cluster description, add a
subcluster_spec
section containing the settings for the new subcluster:resource "yandex_dataproc_cluster" "data_cluster" { ... cluster_config { ... subcluster_spec { name = "<subcluster_name>" role = "<subcluster_role>" resources { resource_preset_id = "<host_class>" disk_type_id = "<storage_type>" disk_size = <storage_size_in_GB> } subnet_id = "<subnet_ID>" hosts_count = <number_of_subcluster_hosts> ... } } }
Where
role
is the subcluster role,COMPUTENODE
orDATANODE
. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information about resources you can create using Terraform, see the provider documentation
Deleting a subcluster
Warning
You cannot delete data storage subclusters.
To delete a subcluster:
- In the management console
, select the appropriate folder. - Select Yandex Data Processing and the required cluster.
- Go to Subclusters.
- Click
for the subcluster you need and select Delete. - (Optional) Specify the decommissioning timeout.
- In the window that opens, click Delete.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To delete a subcluster in a Yandex Data Processing cluster, run the command:
yc dataproc subcluster delete <subcluster_name_or_ID> \
--cluster-name=<cluster_name>
You can request a subcluster name or ID with a list of cluster subclusters, and a cluster name with a list of folder clusters.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about how to create this file, see Creating clusters.
-
In the Yandex Data Processing cluster description, delete the
subcluster_spec
section for the required subcluster. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Type
yes
and press Enter.-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information about resources you can create using Terraform, see the provider documentation