Model tuning in DataSphere

Written by

Updated at April 21, 2025

Getting started
- Required paid resources
Set up your infrastructure
Create an API key for the service account
Create secrets
Tune the model
How to delete the resources you created

You can fine-tune language models enabled by Yandex Foundation Models through APIs or ML SDK to better understand the specific features of your tasks. It is convenient to run fine-tuning on Yandex DataSphere notebooks.

In this tutorial, you will fine-tune a model in DataSphere using the SDK. You can also clone the repository and run the notebook locally by changing the authentication settings.

To fine-tune a model:

If you no longer need the resources you created, delete them.

Getting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
Create a community.
Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resources

The cost of supporting the infrastructure for fine-tuning the model includes:

Fee for DataSphere computing resource usage.
Fee for text generation by the model.

Set up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page.

Note

If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.

Create a folder

Management console

In the management console, select a cloud and click Create folder.
Give your folder a name, e.g., data-folder.
Click Create.

Create a service account for the DataSphere project

Management console

Navigate to data-folder.
From the list of services, select Identity and Access Management.
Click Create service account.
Enter a name for the service account, e.g., gpt-user.
Click Add role and assign the ai.languageModels.user role to the service account.
Click Create.

Add the service account to a project

To allow the service account to access the model from the notebook, add it to the list of project members.

Management console

Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.
In the Members tab, click Add member.
Select the gpt-user account and click Add.

Create an API key for the service account

To allow the service account to access the model, create an API key.

Management console

In the management console, navigate to data-folder.
From the list of services, select Identity and Access Management.
In the left-hand panel, select Service accounts.
In the list that opens, select the gpt-user service account.
In the top panel, click Create new key and select Create API key.
Click Create.
Save the ID and secret key.

Create secrets

To get the API key and folder ID from the notebook, create secrets with the key and folder IDs.

Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.
Under Project resources, click Secret.
Click Create.
In the Name field, enter the name for the secret: API_KEY.
In the Value field, paste the key ID.
Click Create.
Create another secret with the FOLDER_ID.

Tune the model

The code for fine-tuning is run from the DataSphere notebook. Fine-tuning data is stored in JSON Lines format.

Open the notebook with the code by following the link below:
Download the generations.jsonlines file from the GitHub repository and place it next to your notebook.
Install the DataSphere SDK by running the code in the notebook cell:
```
%pip install yandex-cloud-ml-sdk --upgrade
```

Import the required libraries:

import pathlib
import uuid
import os
from __future__ import annotations

from yandex_cloud_ml_sdk import YCloudML
import urllib.request
import zipfile

Run TensorBoard. You will need it to view fine-tuning metrics:

def download_tensorboard(url):
    urllib.request.urlretrieve(url, "tensorboard.zip")
    with zipfile.ZipFile('tensorboard.zip', 'r') as zip_ref:
        zip_ref.extractall("tensorboard")

Upload data for model training:

def local_path(path: str) -> pathlib.Path:
    return pathlib.Path(path)

dataset_path = local_path("generations.jsonlines")
print(dataset_path)

print("Tuning dataset")
print(dataset_path.read_text())

Create an SDK object containing the authorization parameters:

sdk = YCloudML(
    folder_id=os.environ['FOLDER_ID'], # Folder ID stored in the DataSphere secret.
    auth=os.environ['API_KEY'] # Service account API key stored in the DataSphere secret.
)

Create a dataset to tune and run its loading and validation:

dataset_draft = sdk.datasets.draft_from_path(
    task_type='TextToTextGeneration',
    path=dataset_path,
    upload_format='jsonlines',
    name='test-generations'
)

dataset = dataset_draft.upload()
print('Dataset is being loaded and validated')
print(f'New dataset {dataset=} \n')

Select the base model you want to tune and run the tuning. In the example: YandexGPT Lite:

base_model = sdk.models.completions('yandexgpt-lite')

tuning_task = base_model.tune_deferred(
    dataset,
    name=str(uuid.uuid4()),
    n_samples=10000 
)

print(f'Tuning started {tuning_task} \n')

# Tuning can last up to several hours
# We wait until the tuning is complete and get a new model
new_model = tuning_task.wait()
print(f'Tuning completed, new model = {new_model} \n')

Get your fine-tuned model:

# If you closed your notebook, you can use task_id to check the operation status.
tuning_task = sdk.tuning.get(task_id)
print(tuning_task.get_task_info())

new_model = tuning_task.get_result()
print({new_model})

Test your new model:

print('Sending queries to the new model\n')
# Examples of queries to the model
completion_result = new_model.run("What is your name?")
print(f'{completion_result=} \n')

# You can use the fine-tuned model by specifying its URI
tuned_uri = new_model.uri 
model = sdk.models.completions(tuned_uri)

completion_result = model.run("Where are you from?")
print(f'{completion_result=} \n')

The model will generate and return answers factoring in the fine-tuning.

Download tuning metrics:

# Get the link with the data for TensorBoard and download the files
metrics_url = new_model.get_metrics_url()
download_tensorboard(metrics_url)

How to delete the resources you created

To stop paying for the resources you created, delete the project.

Model tuning in DataSphere

Getting started

Required paid resources

Set up your infrastructure

Create a folder

Create a service account for the DataSphere project

Add the service account to a project

Create an API key for the service account

Create secrets

Tune the model

How to delete the resources you created

Was the article helpful?