DataSphere projects
A project is a user's main workspace that serves as a single entry point for all DataSphere features. A project allows you to perform computations on Yandex Cloud VMs with standard configurations and stores DataSphere user resources.
A notebook is an *.ipynb
file that you work with in the JupyterLab
Project storage
DataSphere provides 10 GB of free storage for each project. You can increase the storage size, but this will result in additional charges. See the cost of expanding the main storage in DataSphere pricing policy.
You can upload small amounts of data (up to 100 MB) to your DataSphere project through the UI. If you want to upload larger amounts of data, use your network storage or databases. For large data volumes, you can also use datasets.
Configuring a project runtime environment
Projects you create will have a pre-configured development environment and pre-installed packages. DataSphere provides several Docker images of the environment with a choice of Python versions and libraries. The DS Default (Python 3.10) image is used by default, but you can select another standard image. For a list of all pre-installed packages, see List of pre-installed software. If you are missing a package, you can install it directly from the notebook cell or build a Docker image.
DataSphere Notebook
DataSphere Notebook allows you to run computations on a VM as if you were running them on a local JupyterLab notebook. DataSphere Notebook provides the selected configuration for long-term use and assigns the VM to the project notebook until you forcibly return it to the pool of available VMs or until the timeout expires. By default, the VM is released if no computations are run in the project within three hours. You can change this value in the project settings.
Cell code changes will be saved automatically. You can disable notebook autosaves in the JupyterLab settings by selecting Settings ⟶ Autosave Documents in the top menu. If you want to save an interpreter state or output, you will need to do that yourself.
You can link multiple VM configurations to a single project. When running computations in your notebook for the first time, select a configuration to use for them.
The DataSphere Notebook billing will start once the first computations are run in a notebook and will continue as long as the VM is assigned to the project. You can learn more about DataSphere usage cost here.
JupyterLab console
In DataSphere Notebook, you can use the JupyterLab console with an interactive Python interpreter. The console is run on a separate VM instance with the c1.4 configuration. To open the console, on the JupyterLab home page, select DataSphere Kernel under Console. You enter commands in the console input line and run them using the Shift + Enter keyboard shortcut.
If you just close the console, the VM instance will keep running. To shut down the console VM and stop paying for it, use the widget in the top-right corner of the screen or on the project home page and shut down the console VM.
JupyterLab extensions
The following JupyterLab extensions are available:
- JupyterLab-latex
- JupyterLab-widgets ipywidgets
- JupyterLab-code-formatter black isort
- JupyterLab-execute-time
- JupyterLab-limit-output
- JupyterLab-spellchecker
- JupyterLab-templates