Getting started with Yandex Data Processing
To get started:
Getting started
-
Navigate to the management console
and either log in to Yandex Cloud or sign up if you do not have an account yet. -
If you do not have a folder yet, create one:
-
In the management console
, select the appropriate cloud from the list on the left. -
At the top right, click Create folder.
-
Give your folder a name. The naming requirements are as follows:
- It must be from 2 to 63 characters long.
- It can only contain lowercase Latin letters, numbers, and hyphens.
- It must start with a letter and cannot end with a hyphen.
-
Optionally, specify the description for your folder.
-
Select Create a default network. This will create a network with subnets in each availability zone. Within this network, you will also have a default security group, within which all network traffic will be allowed.
-
Click Create.
-
-
Assign the following roles to your Yandex Cloud account:
- dataproc.editor: Required for cluster creation.
- vpc.user: Required to access the cluster network.
- iam.serviceAccounts.user: Required to attach a service account to the cluster and create resources using its permissions.
Note
If you are unable to manage roles, contact your cloud or organization administrator.
-
Set up a NAT gateway in the subnet where your cluster will be deployed.
-
If you use security groups, configure them.
-
You can access a Yandex Data Processing cluster both from within the Yandex Cloud infrastructure and from external networks:
-
To connect from within Yandex Cloud, create a Linux VM in the cluster’s network.
-
To connect to the cluster from the internet, enable public access for subclusters during cluster creation.
Note
The next step requires connecting to the cluster from a Linux-based VM.
-
-
Connect to your VM over SSH.
Create a cluster
To create a cluster:
- In the management console, navigate to the folder where you want to create your cluster, then select Yandex Data Processing.
- Click Create cluster.
- Specify your cluster settings and click Create cluster. For more information, see Creating clusters.
- When the cluster is ready for operation, its status will change to Alive. This may take some time.
Connect to the cluster
To connect to your cluster:
-
If you are using security groups for a cloud network, configure them to enable all required traffic between the cluster and the connecting host.
-
Copy the SSL key you specified during Yandex Data Processing cluster creation to the VM.
-
Connect to the cluster over SSH and check that Hadoop commands run properly. Depending on your image version, specify the username:
- For version 2.0, use
ubuntu
as the username. - For version 1.4, use
root
as the username.
- For version 2.0, use
For a detailed description of the Yandex Data Processing cluster connection process, refer to the Connecting to a cluster section.
Connect to the component interfaces
To connect to the Yandex Data Processing component interfaces using the web UI:
- Enable the UI Proxy setting in the cluster.
- Get a list of interface URLs.
To connect to the Yandex Data Processing component interfaces via SSH with port forwarding:
-
Create a jumpbox VM with a public IP address in the cluster’s network, using a security group that allows incoming and outgoing traffic on all component ports.
-
Connect to the new VM over SSH with port forwarding to the required Yandex Data Processing host ports. Depending on your image version, specify the username:
- For version 2.0, use
ubuntu
as the username. - For version 1.4, use
root
as the username.
- For version 2.0, use
The detailed process for connecting to the Yandex Data Processing cluster’s component interfaces is described in Connecting to component interfaces.
What's next
- Read about service concepts.
- Learn more about creating clusters and working with jobs.