Getting started with Yandex Data Processing
To get started with the service:
Getting started
-
Go to the management console
and log in to Yandex Cloud or sign up if not signed up yet. -
If you do not have a folder yet, create one:
-
In the management console
, select the appropriate cloud from the list on the left. -
At the top right, click Create folder.
-
Enter the folder name. The naming requirements are as follows:
- The name must be from 3 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- The first character must be a letter and the last character cannot be a hyphen.
-
(Optional) Enter a description of the folder.
-
Select Create a default network. This will create a network with subnets in each availability zone. Within this network, a default security group will be created, inside which all network traffic is allowed.
-
Click Create.
-
-
Make sure your account has the following roles for creating a cluster:
- dataproc.editor: To create a cluster.
- vpc.user: To use the cluster network.
- iam.serviceAccounts.user: To link a service account to the cluster and create resources under that service account.
-
Set up a NAT gateway in the subnet to host the cluster.
-
If you use security groups, configure them.
-
You can connect to a Yandex Data Processing cluster from both inside and outside Yandex Cloud:
-
To connect from inside Yandex Cloud, create a Linux virtual machine in the same network as the cluster.
-
To be able to connect to the cluster from the internet, request public access to subclusters when creating the cluster.
Note
The next step assumes that you connect to the cluster from a Linux-based VM.
-
-
Connect to the VM over SSH.
Create a cluster
To create a cluster:
- In the management console, open the folder to create your cluster in and select Yandex Data Processing.
- Click Create cluster.
- Set the cluster parameters and click Create cluster. For more information, see Creating clusters.
- Wait until the cluster is ready for use: its status will change to Alive. This may take some time.
Connect to the cluster
To connect to a cluster:
-
If you are using security groups for a cloud network, configure them to enable all relevant traffic between the cluster and the connecting host.
-
Copy the SSL key that you specified when creating the Yandex Data Processing cluster to the VM.
-
Connect to the cluster via SSH and make sure that Hadoop commands are executed. Depending on the image version, specify the username:
- For version 2.0:
ubuntu
- For version 1.4:
root
- For version 2.0:
For more information about connecting to a Yandex Data Processing cluster, see Connecting to a cluster.
Connect to component interfaces
To connect to the Yandex Data Processing component interfaces using the web interface:
- Enable the UI Proxy setting in the cluster.
- Get a list of interface URLs.
To connect to the Yandex Data Processing component interfaces via SSH with port forwarding:
-
Create an intermediate VM with a public IP address in the same network as the cluster and with a security group that allows incoming and outgoing traffic through the component ports.
-
Connect to the created VM via SSH with a redirect to the appropriate ports of the Yandex Data Processing host. Depending on the image version, specify the username:
- For version 2.0:
ubuntu
- For version 1.4:
root
- For version 2.0:
For more information about connecting to component interfaces of a Yandex Data Processing cluster, see Connecting to component interfaces.
What's next
- Read about service concepts.
- Learn more about creating clusters and working with jobs.