Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI Studio
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Tutorials
    • All tutorials
      • DataSphere integration with Yandex Data Processing
      • Classification of images in video frames
      • Running computations on a schedule in DataSphere
      • Running computations in DataSphere using the API
      • Using data from Object Storage to train a model in DataSphere
      • Creating an MLFlow server for logging experiments and artifacts
      • Model fine-tuning in DataSphere

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder
  • Create a service account for the DataSphere project
  • Add the service account to the project
  • Create an API key for the service account
  • Create secrets
  • Fine-tune the model
  • How to delete the resources you created
  1. Machine learning and artificial intelligence
  2. Development with DataSphere
  3. Model fine-tuning in DataSphere

Model fine-tuning in DataSphere Notebooks

Written by
Yandex Cloud
Updated at September 3, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder
    • Create a service account for the DataSphere project
    • Add the service account to the project
  • Create an API key for the service account
  • Create secrets
  • Fine-tune the model
  • How to delete the resources you created

You can fine-tune language models enabled by Yandex Foundation Models through APIs or ML SDK to better understand the specific features of your tasks. It is convenient to run fine-tuning on Yandex DataSphere notebooks.

In this tutorial, you will fine-tune a model in DataSphere using the SDK. You can also clone the repository and run the notebook locally by changing the authentication settings.

To fine-tune a model:

  1. Set up your infrastructure.
  2. Create an API key for the service account.
  3. Create secrets.
  4. Fine-tune your model.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Identity Hub organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resourcesRequired paid resources

The infrastructure support cost for fine-tuning a model includes:

  • Fee for using DataSphere computing resources.
  • Fee for text generation by the model.

Set up your infrastructureSet up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can go to the cloud page to create or select a folder to run your infrastructure.

Note

If you are using an identity federation to work with Yandex Cloud, you might not have access to billing details. In this case, contact your Yandex Cloud organization administrator.

Create a folderCreate a folder

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Name your folder, e.g., data-folder.
  3. Click Create.

Create a service account for the DataSphere projectCreate a service account for the DataSphere project

Management console
  1. Navigate to data-folder.
  2. In the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Name the service account, e.g., gpt-user.
  5. Click Add role and assign the ai.languageModels.user role to the service account.
  6. Click Create.

Add the service account to the projectAdd the service account to the project

To enable the service account to access the model from the notebook, add it to the list of project members.

Management console
  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. In the Members tab, click Add member.
  3. Select the gpt-user account and click Add.

Create an API key for the service accountCreate an API key for the service account

To enable the service account to access the model, create an API key.

Management console
  1. In the management console, navigate to data-folder.
  2. In the list of services, select Identity and Access Management.
  3. In the left-hand panel, select Service accounts.
  4. In the list that opens, select the gpt-user service account.
  5. In the top panel, click Create new key and select Create API key.
  6. Click Create.
  7. Save the ID and secret key.

Create secretsCreate secrets

To get the API key and folder ID from the notebook, create secrets with the key and folder IDs.

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Under Project resources, click Secret.
  3. Click Create.
  4. In the Name field, enter the name for the secret: API_KEY.
  5. In the Value field, paste the key ID.
  6. Click Create.
  7. Create another secret named FOLDER_ID and containing the folder ID.

Fine-tune the modelFine-tune the model

You will run the fine-tuning code from the DataSphere notebook. Fine-tuning data is stored in JSON Lines format.

  1. Open the notebook with the code by following the link below:

    Open in DataSphere

  2. Download the generations.jsonlines file from the GitHub repository and place it in the same directory as your notebook.

  3. Install the DataSphere SDK by running this code in a notebook cell:

    %pip install yandex-cloud-ml-sdk --upgrade
    
  4. Import the required libraries:

    import pathlib
    import uuid
    import os
    from __future__ import annotations
    
    from yandex_cloud_ml_sdk import YCloudML
    import urllib.request
    import zipfile   
    
  5. Run TensorBoard. You will need it to view fine-tuning metrics:

    def download_tensorboard(url):
        urllib.request.urlretrieve(url, "tensorboard.zip")
        with zipfile.ZipFile('tensorboard.zip', 'r') as zip_ref:
            zip_ref.extractall("tensorboard")
    
  6. Upload the data for model fine-tuning:

    def local_path(path: str) -> pathlib.Path:
        return pathlib.Path(path)
    
    dataset_path = local_path("generations.jsonlines")
    print(dataset_path)
    
    print("Tuning dataset")
    print(dataset_path.read_text())
    
  7. Create an SDK object containing the authorization parameters:

    sdk = YCloudML(
        folder_id=os.environ['FOLDER_ID'], # Folder ID stored in the DataSphere secret.
        auth=os.environ['API_KEY'] # Service account API key stored in the DataSphere secret.
    )
    
  8. Create a dataset for fine-tuning and initiate its loading and validation:

    dataset_draft = sdk.datasets.draft_from_path(
        task_type='TextToTextGeneration',
        path=dataset_path,
        upload_format='jsonlines',
        name='test-generations'
    )
    
    dataset = dataset_draft.upload()
    print('Dataset is being loaded and validated')
    print(f'New dataset {dataset=} \n')
    
  9. Select the base model you want to fine-tune and run fine-tuning. In our example, it is YandexGPT Lite:

    base_model = sdk.models.completions('yandexgpt-lite')
    
    tuning_task = base_model.tune_deferred(
        dataset,
        name=str(uuid.uuid4()),
        n_samples=10000 
    )
    
    print(f'Tuning started {tuning_task} \n')
    
    # Fine-tuning can last up to several hours.
    # Wait until the fine-tuning is complete to get a new model.
    new_model = tuning_task.wait()
    print(f'Tuning completed, new model = {new_model} \n')
    
  10. Get your fine-tuned model:

    # If you closed your notebook, you can use task_id to check the operation status.
    tuning_task = sdk.tuning.get(task_id)
    print(tuning_task.get_task_info())
    
    new_model = tuning_task.get_result()
    print({new_model})
    
  11. Test your new model:

    print('Sending queries to the new model\n')
    # Examples of queries to the model
    completion_result = new_model.run("What is your name?")
    print(f'{completion_result=} \n')
    
    # You can use the fine-tuned model by specifying its URI.
    tuned_uri = new_model.uri 
    model = sdk.models.completions(tuned_uri)
    
    completion_result = model.run("Where are you from?")
    print(f'{completion_result=} \n')
    

    The model will generate and return answers considering the fine-tuning.

  12. Download fine-tuning metrics:

    # Get the link with the data for TensorBoard and download the files.
    metrics_url = new_model.get_metrics_url()
    download_tensorboard(metrics_url)
    

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created, delete the project.

Was the article helpful?

Previous
Creating an MLFlow server for logging experiments and artifacts
Next
Deploying a service in DataSphere from an ONNX model
© 2025 Direct Cursus Technology L.L.C.