Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Tutorials
    • All tutorials
      • DataSphere integration with Yandex Data Processing
      • Classification of images in video frames
      • Running computations on a schedule in DataSphere
      • Running computations in DataSphere using the API.
      • Using data from Object Storage to train a model in DataSphere
      • Creating an MLFlow server for logging experiments and artifacts
      • Model tuning in DataSphere

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder
  • Create a service account for the DataSphere project
  • Add the service account to a project
  • Create an API key for the service account
  • Create secrets
  • Tune the model
  • How to delete the resources you created
  1. Machine learning and artificial intelligence
  2. Development with DataSphere
  3. Model tuning in DataSphere

Model tuning in DataSphere

Written by
Yandex Cloud
Updated at April 21, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder
    • Create a service account for the DataSphere project
    • Add the service account to a project
  • Create an API key for the service account
  • Create secrets
  • Tune the model
  • How to delete the resources you created

You can fine-tune language models enabled by Yandex Foundation Models through APIs or ML SDK to better understand the specific features of your tasks. It is convenient to run fine-tuning on Yandex DataSphere notebooks.

In this tutorial, you will fine-tune a model in DataSphere using the SDK. You can also clone the repository and run the notebook locally by changing the authentication settings.

To fine-tune a model:

  1. Set up your infrastructure.
  2. Create an API key for the service account.
  3. Create secrets.
  4. Tune the model.

If you no longer need the resources you created, delete them.

Getting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resources

The cost of supporting the infrastructure for fine-tuning the model includes:

  • Fee for DataSphere computing resource usage.
  • Fee for text generation by the model.

Set up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page.

Note

If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.

Create a folder

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Give your folder a name, e.g., data-folder.
  3. Click Create.

Create a service account for the DataSphere project

Management console
  1. Navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Enter a name for the service account, e.g., gpt-user.
  5. Click Add role and assign the ai.languageModels.user role to the service account.
  6. Click Create.

Add the service account to a project

To allow the service account to access the model from the notebook, add it to the list of project members.

Management console
  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. In the Members tab, click Add member.
  3. Select the gpt-user account and click Add.

Create an API key for the service account

To allow the service account to access the model, create an API key.

Management console
  1. In the management console, navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. In the left-hand panel, select Service accounts.
  4. In the list that opens, select the gpt-user service account.
  5. In the top panel, click Create new key and select Create API key.
  6. Click Create.
  7. Save the ID and secret key.

Create secrets

To get the API key and folder ID from the notebook, create secrets with the key and folder IDs.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. Under Project resources, click Secret.
  3. Click Create.
  4. In the Name field, enter the name for the secret: API_KEY.
  5. In the Value field, paste the key ID.
  6. Click Create.
  7. Create another secret with the FOLDER_ID.

Tune the model

The code for fine-tuning is run from the DataSphere notebook. Fine-tuning data is stored in JSON Lines format.

  1. Open the notebook with the code by following the link below:

    Open in DataSphere

  2. Download the generations.jsonlines file from the GitHub repository and place it next to your notebook.

  3. Install the DataSphere SDK by running the code in the notebook cell:

    %pip install yandex-cloud-ml-sdk --upgrade
    
  4. Import the required libraries:

    import pathlib
    import uuid
    import os
    from __future__ import annotations
    
    from yandex_cloud_ml_sdk import YCloudML
    import urllib.request
    import zipfile   
    
  5. Run TensorBoard. You will need it to view fine-tuning metrics:

    def download_tensorboard(url):
        urllib.request.urlretrieve(url, "tensorboard.zip")
        with zipfile.ZipFile('tensorboard.zip', 'r') as zip_ref:
            zip_ref.extractall("tensorboard")
    
  6. Upload data for model training:

    def local_path(path: str) -> pathlib.Path:
        return pathlib.Path(path)
    
    dataset_path = local_path("generations.jsonlines")
    print(dataset_path)
    
    print("Tuning dataset")
    print(dataset_path.read_text())
    
  7. Create an SDK object containing the authorization parameters:

    sdk = YCloudML(
        folder_id=os.environ['FOLDER_ID'], # Folder ID stored in the DataSphere secret.
        auth=os.environ['API_KEY'] # Service account API key stored in the DataSphere secret.
    )
    
  8. Create a dataset to tune and run its loading and validation:

    dataset_draft = sdk.datasets.draft_from_path(
        task_type='TextToTextGeneration',
        path=dataset_path,
        upload_format='jsonlines',
        name='test-generations'
    )
    
    dataset = dataset_draft.upload()
    print('Dataset is being loaded and validated')
    print(f'New dataset {dataset=} \n')
    
  9. Select the base model you want to tune and run the tuning. In the example: YandexGPT Lite:

    base_model = sdk.models.completions('yandexgpt-lite')
    
    tuning_task = base_model.tune_deferred(
        dataset,
        name=str(uuid.uuid4()),
        n_samples=10000 
    )
    
    print(f'Tuning started {tuning_task} \n')
    
    # Tuning can last up to several hours
    # We wait until the tuning is complete and get a new model
    new_model = tuning_task.wait()
    print(f'Tuning completed, new model = {new_model} \n')
    
  10. Get your fine-tuned model:

    # If you closed your notebook, you can use task_id to check the operation status.
    tuning_task = sdk.tuning.get(task_id)
    print(tuning_task.get_task_info())
    
    new_model = tuning_task.get_result()
    print({new_model})
    
  11. Test your new model:

    print('Sending queries to the new model\n')
    # Examples of queries to the model
    completion_result = new_model.run("What is your name?")
    print(f'{completion_result=} \n')
    
    # You can use the fine-tuned model by specifying its URI
    tuned_uri = new_model.uri 
    model = sdk.models.completions(tuned_uri)
    
    completion_result = model.run("Where are you from?")
    print(f'{completion_result=} \n')
    

    The model will generate and return answers factoring in the fine-tuning.

  12. Download tuning metrics:

    # Get the link with the data for TensorBoard and download the files
    metrics_url = new_model.get_metrics_url()
    download_tensorboard(metrics_url)
    

How to delete the resources you created

To stop paying for the resources you created, delete the project.

Was the article helpful?

Previous
Creating an MLFlow server for logging experiments and artifacts
Next
Deploying a service in DataSphere from an ONNX model
© 2025 Direct Cursus Technology L.L.C.