Setting up a project to work with a cloud Yandex Cloud
Yandex DataSphere provides everything you need for data analysis and ML model training. However, if you want to use all Yandex Cloud features, you will need to set up a DataSphere project to work with the cloud in Yandex Cloud and enable integration with other platform services.
This guide describes how to arrange a workspace in DataSphere to effectively use Yandex Cloud services.
- Create a project
- Create a cloud and folder
- Configure a network
- Create a service account
- Service integration examples
For detailed information on how to create and set up resources, see the Step-by-step guides section in the documentation for respective services.
Getting started
Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.
- On the DataSphere home page
, click Try for free and select an account to log in with: Yandex ID or your working account in the identity federation (SSO). - Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
- Create a community.
- Link your billing account to the DataSphere community you are going to work in. Make sure that you have a billing account linked and its status is
ACTIVE
orTRIAL_ACTIVE
. If you do not have a billing account yet, create one in the DataSphere interface.
Create a project
DataSphere communities group users into a team and allow them to share resources and manage budgets. A project within a community is a user's individual workspace run on Yandex Cloud VMs. Depending on the operation mode, a project may include one or more VMs with each VM assigned to a separate notebook within the project.
Note
DataSphere is not designed for pair programming. In Dedicated mode, multiple users can collaborate within a single project if each user is working in a separate notebook.
Create a DataSphere project as described in this guide.
Next, you can specify parameters for integration with other Yandex Cloud services on the project editing page.
Create a cloud and folder
Most Yandex Cloud services run inside cloud folders. To access cloud resources, use Yandex Cloud Console
Log in to the management console and create your first cloud and folder to host services you want to use from DataSphere.
You can learn more on how users work with resources in Yandex Cloud here.
Tip
You can use multiple folders to flexibly set up access rights and distinguish between runtime environments and tasks.
Configure a network
To enable Yandex Cloud service resources to exchange information, create a cloud network and subnet. By default, a network is isolated within Yandex Cloud and has no access to the internet. To make sure the cloud resources have access to the internet without using public IP addresses, create and set up a NAT gateway.
Note
By default, DataSphere projects use a service subnet with access to the internet. If you specify your own subnet with no NAT gateway configured in the project settings, you will not be able to update installed packages and perform other network operations.
Create a service account
Yandex Cloud has a special type of account to automate operations: a service account. Using a service account, software can manage service resources. To enable a service account to perform operations on resources, it must be assigned the appropriate roles. To learn more about the current service roles, see the Access management section of the service documentation.
DataSphere supports two ways to enable a service account to perform operations:
- If a service account needs to perform operations on resources of other services on behalf of DataSphere, add it to project settings.
- If a service account needs to perform operations on a project or community in DataSphere (run cells, create resources, etc.), add it to the list of project members or community members with the respective role.
Service integration examples
In this section, you will find examples of how to set up a project to perform a variety of tasks in DataSphere and how to set up integration with Yandex Cloud services.
Computing on Apache Spark™ clusters
DataSphere allows you to run computing on Apache Spark™ clusters created in Yandex Data Processing.
To use Yandex Data Processing clusters, set the following project parameters:
-
Default folder to enable integration with other Yandex Cloud services. A Yandex Data Processing cluster will be deployed in this folder based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.
-
Service account to be used by DataSphere for creating and managing clusters. The service account needs the following roles:
dataproc.agent
to use Yandex Data Processing clusters.dataproc.admin
to create clusters from Yandex Data Processing templates.vpc.user
to use the Yandex Data Processing cluster network.iam.serviceAccounts.user
to create resources in the folder on behalf of the service account.
-
Subnet for DataSphere to communicate with the Yandex Data Processing cluster. Since the Yandex Data Processing cluster needs to access the internet, make sure to configure a NAT gateway in the subnet.
Note
If you specified a subnet in the project settings, the time to allocate computing resources may be increased.
Warning
The Yandex Data Processing persistent cluster must have the livy:livy.spark.deploy-mode : client
setting.
Learn more about working with Yandex Data Processing clusters in DataSphere:
Deploying a pretrained model as a service
If you want to deploy a model as a separate service in DataSphere, use nodes based on a Docker image. In the project settings, specify the following parameters:
- Default folder to store node logs.
- Service account with the following permissions:
container-registry.images.puller
to allow DataSphere to pull your Docker image for creating a node.vpc.user
to use the DataSphere network.- (Optional)
datasphere.user
to send requests to the node.
Learn more about deploying services in DataSphere: