Working with Yandex Data Processing templates
Yandex Data Processing templates enable you to preset a cluster configuration for your project, making it easier to deploy temporary clusters. You can find a list of templates on your project page under Project resources → Yandex Data Processing, the Shared tab.
To work with Yandex Data Processing clusters:
-
In the project settings, specify these parameters:
- Default folder for integrating with other Yandex Cloud services. It will house a Yandex Data Processing cluster based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.
- Service account with the
vpc.userrole. DataSphere will use for this account to work with the Yandex Data Processing cluster network. - Subnet for DataSphere to communicate with the Yandex Data Processing cluster. Since the Yandex Data Processing cluster needs to access the internet, make sure to configure a NAT gateway in this subnet. After you specify a subnet, the time for computing resource allocation may increase.
-
Create a service agent:
-
To allow a service agent to operate in DataSphere, ask your cloud admin or owner to run the following command in the Yandex Cloud CLI:
yc iam service-control enable datasphere --cloud-id <cloud_ID>Where
--cloud-idis the ID of the cloud you are going to use in the DataSphere community. -
Create a service account with the following roles:
dataproc.agentto use Yandex Data Processing clusters.dataproc.adminto create clusters from Yandex Data Processing templates.vpc.userto use the Yandex Data Processing cluster network.iam.serviceAccounts.userto create resources in the folder on behalf of the service account.
-
Under Spark clusters in the community settings, click Add service account and select the service account you created.
-
Warning
The Yandex Data Processing persistent cluster must have the livy:livy.spark.deploy-mode : client setting.
Creating a Yandex Data Processing template
-
Select the project in your community or on the DataSphere home page
in the Recent projects tab. -
Under Project resources, click Yandex Data Processing.
-
Click Create template.
-
In the Template name field, enter a name for the template. Follow these naming requirements:
- It must be from 2 to 63 characters long.
- It can only contain lowercase Latin letters, numbers, and hyphens.
- It must start with a letter and cannot end with a hyphen.
-
Click Create. You will see a page with detailed info on the template you created.
Activating a Yandex Data Processing template
-
Select the project in your community or on the DataSphere home page
in the Recent projects tab. - Under Project resources, click Yandex Data Processing.
- Click
next to the template you need and select Activate.
The system will create a cluster based on the activated Yandex Data Processing template when you run your project in the IDE.
Sharing a Yandex Data Processing template
-
Select the project in your community or on the DataSphere home page
in the Recent projects tab. - Under Project resources, click Yandex Data Processing.
- Select the template from the list.
- Go to the Access tab.
- Enable the visibility option next to the name of the community you want to share the template in.
To make a template available for use in a different project, the project admin needs to add that template on the Shared tab.
Editing a template
You can only change the name of an existing template. To update the configuration, recreate the template.
-
Select the project in your community or on the DataSphere home page
in the Recent projects tab. - Under Project resources, click Yandex Data Processing.
- Select the template from the list, click
, and select Edit. - Edit the name and click Save.
Deleting a Yandex Data Processing template
-
Select the project in your community or on the DataSphere home page
in the Recent projects tab. - Under Project resources, click Yandex Data Processing.
- In the list, select the template you want to delete.
- Click
and select Delete. - Click Confirm.
Warning
In fact, resource deletion can take up to 72 hours.