Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • About Yandex DataSphere
    • DataSphere resource relationships
    • Communities
    • Cost management
    • Project
    • Computing resource configurations
      • Overview
      • Secrets
      • Docker images
      • Datasets
      • Yandex Data Processing templates
      • S3 connectors
      • Spark connectors
      • Models
      • Fine-tuned foundation models
      • File storages
    • Foundation models
    • Quotas and limits
    • Special terms for educational institutions
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes
  1. Concepts
  2. Resources
  3. Datasets

Datasets in DataSphere

Written by
Yandex Cloud
Updated at October 27, 2023

A dataset in DataSphere is a way to store information that provides quick access to large amounts of data. Datasets can store up to 4 TB of data, giving faster access to data than to the main project storage.

Tip

The larger is the disk allocated for the dataset, the higher is the data read speed.

A dataset is created and populated during initialization. Once initialized, you cannot change a dataset, as it becomes read-only. If you want to add files to a dataset, create it once again.

Datasets are not included in the main project storage and are priced separately.

To use your data in multiple projects, you can share your datasets within your community, just like any other resources.

When activated in the project, a disk with a dataset is mounted to the project storage. You can read the files of your activated dataset as local files in your project storage, at the following path: /home/jupyter/mnt/datasets/<dataset_name>.

You can have up to three datasets activated in a project at the same time. You can activate and deactivate datasets in your projects on the go, without having to restart them. For a complete list of DataSphere restrictions, see Quotas and limits in DataSphere.

Information about a dataset as a resourceInformation about a dataset as a resource

For each dataset, the following information is stored:

  • Name.
  • Status of its connection to the project.
  • Name of the user who created the dataset.
  • Dataset creation date in UTC format, such as July 18, 2022, 14:23.

To view dataset details, click its name in the project's dataset list. On the dataset's Overview tab, you can see:

  • Availability zone the dataset is stored in.
  • Size.
  • Initialization code.

See alsoSee also

  • Working with datasets

Was the article helpful?

Previous
Docker images
Next
Yandex Data Processing templates
© 2025 Direct Cursus Technology L.L.C.