Configuring a network for Yandex Data Processing
In this tutorial, you will learn how to create a Yandex Data Processing cluster and set up subnets and a NAT gateway.
Required paid resources
The support cost includes:
- Yandex Data Processing cluster fee: using VM computing resources and Compute Cloud network disks, and Cloud Logging for log management (see Yandex Data Processing pricing).
- Fee for a NAT gateway (see Virtual Private Cloud pricing).
- Fee for an Object Storage bucket: data storage and operations (see Object Storage pricing).
Create resources
-
Create a network named
data-proc-networkwith the Create subnets option disabled. -
In
data-proc-network, create a subnet with the following settings:- Name:
data-proc-subnet-a - Availability zone:
ru-central1-a - CIDR:
192.168.1.0/24
- Name:
-
Create a NAT gateway and a route table named
data-proc-route-tableindata-proc-network. Associate the table withdata-proc-subnet-a. -
In
data-proc-network, create a security group nameddata-proc-security-groupwith the following rules:-
One rule for incoming and another one for outgoing service traffic:
- Port range:
0-65535 - Protocol:
Any - Source/Destination name:
Security group - Security group:
Current
- Port range:
-
Rule for outgoing HTTPS traffic:
- Port range:
443 - Protocol:
TCP - Destination name:
CIDR - CIDR blocks:
0.0.0.0/0
- Port range:
-
Rule that allows access to NTP servers for time syncing:
- Port range:
123 - Protocol:
UDP - Destination name:
CIDR - CIDR blocks:
0.0.0.0/0
- Port range:
Note
You can configure additional security group rules to connect to cluster hosts.
-
-
Create a service account named
data-proc-sawith the following roles: -
Create a Yandex Object Storage bucket with restricted access.
-
Create a Yandex Data Processing cluster in any suitable configuration with the following settings:
- Service account:
data-proc-sa. - Bucket ID format:
List. - Bucket name: Select the bucket you created earlier.
- Network:
data-proc-network. - Security groups:
data-proc-security-group.
- Service account:
-
If you do not have Terraform yet, install it and configure the Yandex Cloud provider.
-
Get the authentication credentials and specify the Yandex Cloud provider installation source (see Configure your provider, Step 1).
-
Download the cluster configuration file
to the same working directory.This file describes:
- Network.
- Subnet.
- NAT gateway and route table.
- Security group.
- Service account to work with cluster resources.
- Service account for bucket management.
- Static access key required to grant the service account permissions for the bucket.
- Bucket to store job dependencies and results.
- Yandex Data Processing cluster.
Note
You can configure additional security group rules to connect to cluster hosts.
-
In the configuration file, specify all the relevant parameters.
-
Run the
terraform initcommand in the working directory with the configuration files. This command initializes the provider specified in the configuration files and enables you to use its resources and data sources. -
Make sure the Terraform configuration files are correct using this command:
terraform validateTerraform will show any errors found in your configuration files.
-
Create the required infrastructure:
-
Run this command to view the intended changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
All the resources you need will be created in the specified folder. You can check the new resources and their settings using the management console
Delete the resources you created
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
- Delete the Yandex Data Processing cluster.
- If you reserved public static IP addresses, release and delete them.
- Delete the subnet.
- Delete the route table.
- Delete the NAT gateway.
- Delete the network.
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy -
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-