Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Object Storage
    • All tutorials
    • Getting statistics on object queries with S3 Select
    • Getting website traffic statistics with S3 Select
    • Getting statistics on object queries using Yandex Query
    • Generating a resource-by-resource cost breakdown report using S3 Select
    • Server-side encryption
    • Integrating an L7 load balancer with CDN and Object Storage
    • Blue-green and canary deployment of service versions
    • Analyzing logs in DataLens
    • Mounting buckets to the file system of Yandex Data Processing hosts
    • Using Object Storage in Yandex Data Processing
    • Importing data from Object Storage, processing and exporting to Managed Service for ClickHouse®
    • Mounting a bucket as a disk in Windows
    • Migrating data from Yandex Data Streams using Yandex Data Transfer
    • Using hybrid storage in Yandex Managed Service for ClickHouse®
    • Loading data from Yandex Managed Service for OpenSearch to Yandex Object Storage using Yandex Data Transfer
    • Automatically copying objects from one bucket to another
    • Recognizing audio files in a bucket on a regular basis
    • Training a model in Yandex DataSphere on data from Object Storage
    • Connecting to Object Storage from VPC
    • Migrating data to Yandex Managed Service for PostgreSQL using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for Greenplum® using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for ClickHouse® using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for YDB using Yandex Data Transfer
    • Exchanging data between Yandex Managed Service for ClickHouse® and Yandex Data Processing
    • Uploading data from Yandex Managed Service for YDB using Yandex Data Transfer
    • Hosting a static Gatsby website in Object Storage
    • Migrating a database from Managed Service for PostgreSQL to Object Storage
    • Exchanging data between Yandex Managed Service for ClickHouse® and Yandex Data Processing
    • Importing data from Yandex Managed Service for PostgreSQL to Yandex Data Processing using Sqoop
    • Importing data from Yandex Managed Service for MySQL® to Yandex Data Processing using Sqoop
    • Migrating data from Yandex Object Storage to Yandex Managed Service for MySQL® using Yandex Data Transfer
    • Migrating a database from Yandex Managed Service for MySQL® to Yandex Object Storage
    • Exporting Greenplum® data to a cold storage in Yandex Object Storage
    • Loading data from Yandex Direct to a Yandex Managed Service for ClickHouse® data mart using Yandex Cloud Functions, Yandex Object Storage, and Yandex Data Transfer
    • Migrating data from Elasticsearch to Yandex Managed Service for OpenSearch
    • Uploading Terraform states to Object Storage
    • Locking Terraform states using Managed Service for YDB
    • Visualizing Yandex Query data
    • Publishing game updates
    • VM backups using Hystax Acura
    • Backing up to Object Storage with CloudBerry Desktop Backup
    • Backing up to Object Storage with Duplicati
    • Backing up to Object Storage with Bacula
    • Backing up to Yandex Object Storage with Veeam Backup
    • Backing up to Object Storage with Veritas Backup Exec
    • Managed Service for Kubernetes cluster backups in Object Storage
    • Developing a custom integration in API Gateway
    • URL shortener
    • Storing application runtime logs
    • Developing a skill for Alice and a website with authorization
    • Creating an interactive serverless application using WebSocket
    • Deploying a web application using the Java Servlet API
    • Developing a Telegram bot
    • Replicating logs to Object Storage using Fluent Bit
    • Replicating logs to Object Storage using Data Streams
    • Uploading audit logs to ArcSight SIEM
    • Exporting audit logs to SIEM Splunk systems
    • Creating an MLFlow server for logging experiments and artifacts
    • Operations with data using Yandex Query
    • Federated data queries using Query
    • Recognizing text in image archives using Vision OCR
    • Converting a video to a GIF in Python
    • Automating tasks using Managed Service for Apache Airflow™
    • Processing files with usage details in Yandex Cloud Billing
    • Deploying a web app with JWT authorization in API Gateway and authentication in Firebase
    • Searching for Yandex Cloud events in Yandex Query
    • Searching for Yandex Cloud events in Object Storage
    • Creating an external table from a bucket table using a configuration file
    • Migrating databases from Google BigQuery to Managed Service for ClickHouse®
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Bucket logs
  • Release notes
  • FAQ

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Set up the runtime environment
  • Create a folder
  • Create a service account for Object Storage
  • Create a static access key
  • Create a bucket
  • Set up a connection to Object Storage
  • Mount your bucket to the Windows file system
  • Prepare data for model training
  • Attach your bucket to a project
  • Train your model
  • How to delete the resources you created
  1. Tutorials
  2. Training a model in Yandex DataSphere on data from Object Storage

Using data from Object Storage to train a model in Yandex DataSphere

Written by
Yandex Cloud
Updated at May 7, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Set up the runtime environment
    • Create a folder
    • Create a service account for Object Storage
  • Create a static access key
  • Create a bucket
  • Set up a connection to Object Storage
  • Mount your bucket to the Windows file system
  • Prepare data for model training
  • Attach your bucket to a project
  • Train your model
  • How to delete the resources you created

You will create a bucket in Object Storage, mount it to the Windows file system using rclone, and upload data to this bucket to train a model in Yandex DataSphere.

To use data from Object Storage to train a model in DataSphere:

  1. Set up your infrastructure.
  2. Create a static access key.
  3. Create a bucket.
  4. Set up a connection to Object Storage.
  5. Mount your bucket.
  6. Prepare data for model training.
  7. Create an S3 connector.
  8. Attach your bucket to a project.
  9. Train your model.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resourcesRequired paid resources

The cost of training a model based on data from Object Storage includes:

  • Fee for using DataSphere computing resource.
  • Fee for storing data in a bucket (see Object Storage pricing).
  • Fee for data operations (see Object Storage pricing).

Set up your infrastructureSet up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page.

Note

If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.

Set up the runtime environmentSet up the runtime environment

Windows

Download and install the winfsp distribution from the winfsp website.

Download from the respective developer’s website and unpack to the working directory on your local computer:

  • Archive with the sysinternals suite utilities
  • Archive with the Windows Service Wrapper (winsw) utility
  • Archive with the rclone utility

Create a folderCreate a folder

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Give your folder a name, e.g., data-folder.
  3. Click Create.

Create a service account for Object StorageCreate a service account for Object Storage

To access a bucket in Object Storage, you need a service account with the storage.editor role.

Management console
  1. In the management console, navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Enter a name for the service account, e.g., datasphere-sa.
  5. Click Add role and assign the storage.editor role to the service account.
  6. Click Create.

Create a static access keyCreate a static access key

To access Object Storage from DataSphere, you need a static key.

Management console
Yandex Cloud CLI
API
  1. In the management console, navigate to the folder the service account belongs to.
  2. From the list of services, select Identity and Access Management.
  3. In the left-hand panel, select Service accounts.
  4. From the list that opens, select the datasphere-sa service account.
  5. In the top panel, click Create new key.
  6. Select Create static access key.
  7. Specify the static key description and click Create.
  8. Save the ID and secret key. After you close this dialog, the key value will no longer be available.
  1. Create a static access key for the datasphere-sa service account:

    yc iam access-key create --service-account-name datasphere-sa
    

    Result:

    access_key:
      id: aje6t3vsbj8l********
      service_account_id: ajepg0mjt06s********
      created_at: "2022-07-18T14:37:51Z"
      key_id: 0n8X6WY6S24N7Oj*****
    secret: JyTRFdqw8t1kh2-OJNz4JX5ZTz9Dj1rI9hx*****
    

    For more information about the yc iam access-key create command, see the CLI reference.

  2. Save the ID (key_id) and secret key (secret). You will not be able to get the secret key again.

To create a static access key, use the create method for the AccessKey resource.

Create a bucketCreate a bucket

Management console
AWS CLI
Terraform
API
  1. In the management console, select the folder where you want to create a bucket.
  2. From the list of services, select Object Storage.
  3. At the top right, click Create bucket.
  4. In the ** Name** field, enter a name for the bucket consistent with the naming conventions.
  5. In the Object read access, Object listing access, and Read access to settings fields, select Restricted.
  6. Click Create bucket.
  1. If you do not have the AWS CLI yet, install and configure it.

  2. Enter the bucket name following the naming conventions:

    aws --endpoint-url https://storage.yandexcloud.net \
      s3 mb s3://<bucket_name>
    

    Result:

    make_bucket: <bucket_name>
    

Note

Terraform uses a service account to interact with Object Storage. Assign to the service account the required role, e.g., storage.admin, for the folder where you are going to create resources.

If you do not have Terraform yet, install it and configure its Yandex Cloud provider.

  1. Describe the properties for creating a service account and access key in the configuration file:

    ...
    // Creating a service account
    resource "yandex_iam_service_account" "sa" {
      name = "<service_account_name>"
    }
    
    // Assigning a role to a service account
    resource "yandex_resourcemanager_folder_iam_member" "sa-admin" {
      folder_id = "<folder_ID>"
      role      = "storage.admin"
      member    = "serviceAccount:${yandex_iam_service_account.sa.id}"
    }
    
    // Creating a static access key
    resource "yandex_iam_service_account_static_access_key" "sa-static-key" {
      service_account_id = yandex_iam_service_account.sa.id
      description        = "static access key for object storage"
    }
    
  2. Add a section with bucket properties to the configuration file and enter the bucket name following the naming conventions:

    resource "yandex_storage_bucket" "<bucket_name>" {
      access_key = yandex_iam_service_account_static_access_key.sa-static-key.access_key
      secret_key = yandex_iam_service_account_static_access_key.sa-static-key.secret_key
      bucket     = "<bucket_name>"
    }
    

    For more information about the yandex_storage_bucket resource, see the Terraform provider documentation.

  3. Create resources:

    1. In the terminal, change to the folder where you edited the configuration file.

    2. Make sure the configuration file is correct using the command:

      terraform validate
      

      If the configuration is correct, the following message is returned:

      Success! The configuration is valid.
      
    3. Run the command:

      terraform plan
      

      The terminal will display a list of resources with parameters. No changes are made at this step. If the configuration contains errors, Terraform will point them out.

    4. Apply the configuration changes:

      terraform apply
      
    5. Confirm the changes: type yes in the terminal and press Enter.

    Terraform will create all required resources. You can check the new resources and their configuration using the management console.

Use the create REST API method for the Bucket resource, the BucketService/Create gRPC API call, or the create S3 API method.

Set up a connection to Object StorageSet up a connection to Object Storage

To move data from your local disk to Object Storage, configure the rclone utility.

  1. Open the command line in your working directory as an admin and run the following command:

    .\psexec -i -s cmd.exe
    
  2. In the console that opens, run whoami and make sure the session has been started by the system user.

  3. Navigate to the working directory and run a configuration session for rclone:

    rclone.exe config
    
  4. Follow the prompts to create a new connection profile:

    1. Start creating a new profile by entering n in the terminal.
    2. Enter the connection name: s3-connect.
    3. Select the storage type by entering 4 in the terminal.
    4. Select a provider by entering 1 in the terminal.
    5. Select manual entry of credentials by entering 1 in the terminal.
    6. Enter the secret key ID in the terminal.
    7. Enter the secret key value in the terminal.
    8. Specify the region by entering ru-central1 in the terminal.
    9. Specify the endpoint by entering storage.yandexcloud.net in the terminal.
    10. You can leave the other settings at their defaults; press Enter to skip them.

Note

You can perform advanced connection setup if required. To do this, type y at the Edit advanced config? step. For more information about advanced settings, see the rclone documentation.

Mount your bucket to the Windows file systemMount your bucket to the Windows file system

  1. Check your connection to the bucket. In the same command line you used to configure the connection, run the following command with the bucket name specified:

    rclone.exe ls s3-connect:<bucket_name>
    

    If your configuration is correct, you will see a list of objects in the bucket.

  2. Mount your bucket to the file system specifying the bucket name and an available drive letter in the file system:

    rclone.exe mount s3-connect:<bucket_name> <disk_letter>: --vfs-cache-mode full
    

    You will see a new disk with the objects from the bucket in Windows Explorer.

Prepare data for model trainingPrepare data for model training

  1. Download a CSV file with diabetes prediction data. This dataset is based on Kaggle data over 2015.
  2. Copy the file to the bucket you mounted.

Attach your bucket to a projectAttach your bucket to a project

To connect to your bucket from DataSphere, you need an S3 connector:

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. In the top-right corner, click Create resource. In the pop-up window, select S3 Connector.
  3. Fill in the fields as follows:
    • Name: Name of the new connector, e.g., s3-datasphere-connect.
    • Endpoint: Object Storage host, which is https://storage.yandexcloud.net/.
    • Bucket: Name of your bucket.
    • Mount name: Name of the volume for mounting the bucket into the project file system.
    • Static access key ID: ID of the static access key used to connect to the storage.
    • In the Static access key field, click Create. In the window that opens, enter the secret name and secret key used to connect to the storage.
  4. Click Create.
  5. Navigate to the S3 connector page and click Activate. Once activated, the bucket will be listed on the S3 Mounts tab in the JupyterLab interface, and you will be able to view it as a file system.

Train your modelTrain your model

Clone the Git repository containing the diabetes_catboost.ipynb notebook with a sample for CatBoost model training:

  1. Open the DataSphere project:

    1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

    2. Click Open project in JupyterLab and wait for the loading to complete.
    3. Open the notebook tab.
  2. In the top menu, click Git and select Clone.
  3. In the window that opens, enter https://github.com/yandex-cloud-examples/yc-datasphere-s3-mount.git and click Clone.

In the diabetes_catboost.ipynb notebook, you will connect to the diabetes_data.csv dataset in the bucket you mounted and train the CatBoost model:

  1. Install the required modules:

    %pip install catboost
    %pip install scikit-learn
    %pip install ipywidgets
    
  2. Import the required libraries:

    import pandas as pd
    from catboost import CatBoostClassifier
    from sklearn.metrics import classification_report
    from sklearn.model_selection import train_test_split
    
  3. Upload data for model training:

    data = pd.read_csv('<path_to_dataset>')
    data.head()
    

    To get the diabetes_data.csv file path, open the S3 Mounts tab, right-click the file, and select Copy path.

  4. Separate training factors from the target variable:

    X = data.loc[:, data.columns != 'Diabetes_binary']
    y = data['Diabetes_binary']
    
  5. Split the sample into training and test ones:

    X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.75, random_state=42)
    
  6. Create a classifier named CatBoost and set hyperparameters:

    model = CatBoostClassifier(
      iterations=100,
      random_seed=42,
      logging_level='Verbose'
    )
    
  7. Start model training:

    model.fit(
      X_train, y_train,
      eval_set=(X_validation, y_validation),
      plot=True
    )
    
  8. Output quality metrics for the trained model:

    print(classification_report(y_validation, model.predict(X_validation)))
    

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created:

  • Delete the objects from the bucket.
  • Delete the bucket.
  • Delete the project.

Was the article helpful?

Previous
Recognizing audio files in a bucket on a regular basis
Next
Connecting to Object Storage from VPC
© 2025 Direct Cursus Technology L.L.C.