Specifics of working with temporary Yandex Data Processing clusters
If you have no existing Yandex Data Processing clusters or you need a cluster for a short time, use temporary Yandex Data Processing clusters. You can create them using the following:
- Spark connector (preferred)
- Yandex Data Processing template
To use Yandex Data Processing clusters, pre-configure a project.
Regardless of the deployment option, all Yandex Data Processing clusters are charged based on the Yandex Data Processing pricing policy. To view all the clusters available in your project, navigate to Project resources ⟶ Yandex Data Proc on the project page.
Spark connector
When creating a Spark connector, you can create a temporary Yandex Data Processing cluster and configure its parameters. DataSphere will create such a cluster the first time you run computations in your notebook and will monitor it all by itself. The cluster starts and stops together with the notebook VM.
The notebook VM will be stopped if there are no computations on it for the period of time specified in the Stop inactive VM after parameter. You can also force shut down the notebook VM.
To learn more about using Spark connectors, see this guide.
Yandex Data Processing templates
In a Yandex Data Processing template, you select one of the preset cluster configurations. Based on the Yandex Data Processing template activated in the project, DataSphere deploys a temporary cluster using the appropriate project parameters.
DataSphere monitors the project and its temporary clusters on its own. If a project's VMs are idle for two hours, the project gets terminated and the temporary cluster gets deleted. You can set up a VM shutdown timer in the Stop inactive VM after parameter or force shut down the notebook VM.
You can also share Yandex Data Processing templates with other users.
To learn more about using Yandex Data Processing templates, see this guide.