Setting up a project to work with a cloud in Yandex Cloud
Yandex DataSphere provides everything you need for data analysis and ML model training. However, if you want to use all Yandex Cloud features, you will need to set up a DataSphere project to work with the cloud in Yandex Cloud and enable integration with other platform services.
This guide describes how to arrange a workspace in DataSphere to effectively use Yandex Cloud services.
- Create a project.
- Create a cloud and a folder.
- Configure your network.
- Create a service account.
- Service integration examples.
For detailed information on how to create and set up resources, see the Step-by-step guides section in the documentation for respective services.
Getting started
Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.
- On the DataSphere home page
, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO). - Select the Yandex Identity Hub organization you are going to use in Yandex Cloud.
- Create a community.
- Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is
ACTIVEorTRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.
Create a project
DataSphere communities group users into a team and allow them to share resources and manage budgets. A project within a community is a user's individual workspace run on Yandex Cloud VMs. Depending on the operation mode, a project may include one or more VMs with each VM assigned to a separate notebook within the project.
Note
DataSphere is not designed for pair programming. In Dedicated mode, multiple users can collaborate within a single project if each user is working in a separate notebook.
Create a DataSphere project as described in this guide.
Next, you can specify parameters for integration with other Yandex Cloud services on the project edit page.
Create a cloud and a folder
Most Yandex Cloud services run inside cloud folders. To access cloud resources, use Yandex Cloud Console
Log in to the management console and create your first cloud and folder to host services you want to use from DataSphere.
Learn more about user interaction with resources in Yandex Cloud.
Tip
You can use multiple folders to set up granular access and distinguish between runtime environments and tasks.
Configure your network
To enable Yandex Cloud service resources to exchange information, create a cloud network and subnet. By default, a network is isolated within Yandex Cloud and has no access to the internet. To enable your cloud resources to access the internet without using public IP addresses, create and set up a NAT gateway.
Note
By default, DataSphere projects use a service subnet with access to the internet. If you specify your own subnet with no NAT gateway configured in the project settings, you will not be able to update installed packages and perform other network operations.
Create a service account
Yandex Cloud has a special type of account to automate operations: a service account. Via a service account, software can manage service resources. A service account can perform operations on resources only if it has appropriate roles. Learn more about the current service roles in the Access management section of the documentation.
In DataSphere, you can enable a service account to perform operations using these two methods:
- If a service account needs to perform operations on resources of other services on behalf of DataSphere, add it to project settings.
- If a service account needs to perform operations on a project or community in DataSphere (run cells, create resources, etc.), add it to the list of project members or community members with the respective role.
Service integration examples
Check our examples of setting up a project for a variety of tasks in DataSphere and setting up integration with Yandex Cloud services.
Computing on Apache Spark™ clusters
DataSphere allows you to run computations on Apache Spark™ clusters created in Yandex Data Processing.
To work with Yandex Data Processing clusters:
-
In the project settings, specify these parameters:
- Default folder for integrating with other Yandex Cloud services. It will house a Yandex Data Processing cluster based on the current cloud quotas. A fee for using the cluster will be debited from your cloud billing account.
- Service account with the
vpc.userrole. DataSphere will use for this account to work with the Yandex Data Processing cluster network. - Subnet for DataSphere to communicate with the Yandex Data Processing cluster. Since the Yandex Data Processing cluster needs to access the internet, make sure to configure a NAT gateway in this subnet. After you specify a subnet, the time for computing resource allocation may increase.
-
Create a service agent:
-
To allow a service agent to operate in DataSphere, ask your cloud admin or owner to run the following command in the Yandex Cloud CLI:
yc iam service-control enable datasphere --cloud-id <cloud_ID>Where
--cloud-idis the ID of the cloud you are going to use in the DataSphere community. -
Create a service account with the following roles:
dataproc.agentto use Yandex Data Processing clusters.dataproc.adminto create clusters from Yandex Data Processing templates.vpc.userto use the Yandex Data Processing cluster network.iam.serviceAccounts.userto create resources in the folder on behalf of the service account.
-
Under Spark clusters in the community settings, click Add service account and select the service account you created.
-
Warning
The Yandex Data Processing persistent cluster must have the livy:livy.spark.deploy-mode : client setting.
Learn more about working with Yandex Data Processing clusters in DataSphere:
Deploying a pretrained model as a service
If you want to deploy a model as a separate service in DataSphere, use nodes based on a Docker image. In the project settings, specify the following parameters:
- Default folder to store node logs.
- Service account with the following permissions:
container-registry.images.pullerto allow DataSphere to pull your Docker image to create a node.vpc.userto use the DataSphere network.- (Optional)
datasphere.userto send requests to the node.
Learn more about deploying services in DataSphere: