Getting started with DataSphere
Yandex DataSphere is an end-to-end ML development environment where you can use familiar IDEs, serverless computing technology, and seamlessly combine a broad range of Yandex Cloud computing resource configurations. Yandex DataSphere is part of the data platform and offers powerful features to easily work with Yandex Cloud services. As an IDE, DataSphere provides Jupyter® Notebook
In this section, you will learn how to:
- Create projects.
- Run projects.
- Configure the environment.
- Upload data to projects.
- Start training.
- Share your results.
Getting started
- Go to the management console
and log in to Yandex Cloud or sign up if not signed up yet. - Go to Yandex Cloud Billing
and make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account yet, create one. - Open the DataSphere home page
. - Accept the user agreement.
- Select the organization to work with DataSphere in or create a new one.
Create a project
- Open the DataSphere home page
. - In the left-hand panel, select
Communities. - Select the community to create a project in.
- On the community page, click
Create project. - In the window that opens, enter a name and description (optional) for the project.
- Click Create.
Run the project
To run a project, click Open project in JupyterLab.
Configure the environment
Popular packages for data analysis and machine learning are pre-installed and ready for use, see the list.
You can install missing packages using the pip package manager.
To install a package:
-
Write the following command in the notebook cell:
%pip install <package_name>
For example, install the seaborn
package to visualize statistics:%pip install seaborn
You can use various options that the pip install
command supports. See usage examples for this command. -
Run the cell. To do this, click
.The package installation result is displayed under the cell.
You can also configure the environment to run your code using Docker images.
Upload data to the project
You can upload small data volumes (up to 100 MB) to your DataSphere project through the JupyterLab interface. We recommend uploading larger data volumes from network storages or databases. For large data volumes, datasets make another convenient option.
To upload data to your project through the JupyterLab interface:
- In the
File Browser section, select a folder for the data. - Click
at the top left. - Select the files to upload.
Learn more about project storage.
DataSphere allows you to upload data from different sources:
- Connecting to S3 using the boto3 library
- Connecting to Google Drive
- Connecting to a ClickHouse® database
- Connecting to a PostgreSQL database
- Connecting to Yandex Disk
Start training
To start computations:
-
Under the
File Browser section, select the notebook with the Python or bash code. -
Select and run one or more cells with the code by choosing Run → Run Selected Cells, or pressing Shift + Enter.
-
Wait for the operation to complete.
The execution result is displayed under the cell.