General questions about DataSphere
Can I get logs of my operations with services?
Yes, you can request log records about your resources from Yandex Cloud services. For more information, see Data requests.
How do I find out what personal data is stored in Yandex Cloud?
To find out what personal data is stored in Yandex Cloud, contact technical support. You can also request to have your personal data completely deleted
What do I do if I cannot install a package in my project or do not have internet access?
You might have connection issues if you added a subnet to your project that does not have internet access.
If you need a subnet for your project, set up an NAT gateway to get internet access.
You can change or disable the subnet in the project settings.
Can I close a tab with a notebook?
Yes, you can. If you close the notebook tab, current executions will continue, all variables and computation results will be saved, but the output will not be saved for the executions that finished while the notebook was closed.
After completion of all the running computations, the VM will be assigned to the notebook for three hours. You can change this value in the project settings.
How do I specify the configuration type for my project?
You can select a computing resource configuration when you first run computations in the DataSphere notebook. The minimum available configuration is c1.4 (4 vCPUs).
If I delete a running cell, will computations stop?
No, they will not. Computations will continue even if you delete a cell from the notebook. Before deleting a cell, make sure to stop it. If you have deleted a running cell, stop running calculations. To do this, select File ⟶ Stop IDE executions in JupyterLab or click Stop JupyterLab and VM in the Running executions widget on the project page.
How do I clear a cell's outputs?
Select Edit ⟶ Clear All Outputs in JupyterLab or right-click on any cell and select Clear All Outputs. If you choose the second option, the outputs will only be reset for the current session.
Does DataSphere support scheduled cell runs?
You can run scheduled calculations by rerunning the DataSphere Jobs jobs and integrating them with Yandex Managed Service for Apache Airflow™.
You can also use Yandex Cloud Functions to automatically initiate notebook execution using the DataSphere API. For a detailed description of regular runs, see this guide.
My browser cannot open a DataSphere project in the IDE. How can I fix this?
When opening a project in the IDE, DataSphere redirects your request to its own host with JupyterLab. Modern browsers may block such website behavior if you use more advanced privacy tools, including incognito mode. To open a project in the IDE, turn off the blocking settings:
- Chrome: Allow using third-party cookies.
- Safari: Disable Website tracking: Prevent cross-site tracking under Preferences → Privacy.
- Yandex Browser: Allow using third-party cookies for DataSphere in the browser settings under Sites → Advanced site settings.
- Firefox: Click the shield icon in the address bar and disable Enhanced Tracking Protection.
My browser asks me to grant access to a JupyterLab host. How do I do that?
The message is triggered by an experimental option in Google Chrome, which implements the storage access API. To disable it, type chrome://flags
in the browser address bar, find Storage Access API in the search bar below, and change the option status to Disabled.
How do I deploy a Hugging Face model in DataSphere?
Some libraries download models to predefined folders by default. Models may not be available for import if the folder they were downloaded to is not located in the project repository. To avoid this, choose a correct download directory and specify it when importing a model:
cache_dir="/home/jupyter/datasphere/project/huggingface_cache_dir/"
config = AutoConfig.from_pretrained("<model_name>", cache_dir=cache_dir)
model = AutoModel.from_pretrained("<model_name>", config=config, cache_dir=cache_dir)
To avoid specifying the directory's path every time, you can provide it in an environment variable. Make sure you do this at the very start of the notebook, prior to importing libraries:
import os
os.environ['TRANSFORMERS_CACHE'] = '/home/jupyter/datasphere/project/huggingface_cache_dir/'
In addition, you can configure a model to operate in offline mode by referring to the official Hugging Face documentation