Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • All tutorials
      • Classification of images in video frames
      • Running scheduled computations
      • Running computations using the API
      • Geocoding with the Yandex Maps API for data visualization in DataLens
      • Training a model on data from Yandex Object Storage
      • Creating an MLFlow server for logging experiments and artifacts
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder
  • Create a service account for the DataSphere project
  • Add the service account to the project
  • Set up the project
  • Create a notebook
  • Upload and process data
  • Create a function in Cloud Functions
  • Create a Cloud Functions version
  • Create a timer
  • How to delete the resources you created
  1. Tutorials
  2. Development
  3. Running scheduled computations

Running computations on a schedule in DataSphere

Written by
Yandex Cloud
Updated at August 21, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder
    • Create a service account for the DataSphere project
    • Add the service account to the project
    • Set up the project
  • Create a notebook
  • Upload and process data
  • Create a function in Cloud Functions
    • Create a Cloud Functions version
  • Create a timer
  • How to delete the resources you created

You can set up regular run scenarios in Yandex DataSphere using the API by triggering notebook cell execution in Yandex Cloud Functions.

In this tutorial, you will get information about the most discussed stocks on Reddit, analyze the sentiment of the discussion, and set up regular data updates.

DataSphere collects and analyzes information, while a timer created in Cloud Functions triggers regular cell execution.

To set up regular runs of Jupyter notebook:

  1. Set up your infrastructure.
  2. Create a notebook.
  3. Upload and process data.
  4. Create a function in Cloud Functions.
  5. Create a timer.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Identity Hub organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resourcesRequired paid resources

The cost of implementing regular runs includes:

  • Fee for using DataSphere computing resources.
  • Fee for the number of Cloud Functions function calls.

Set up your infrastructureSet up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can go to the cloud page to create or select a folder to run your infrastructure.

Note

If you are using an identity federation to work with Yandex Cloud, you might not have access to billing details. In this case, contact your Yandex Cloud organization administrator.

Create a folderCreate a folder

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Name your folder, e.g., data-folder.
  3. Click Create.

Create a service account for the DataSphere projectCreate a service account for the DataSphere project

To access a DataSphere project from a function in Cloud Functions, you need a service account with the datasphere.community-projects.editor and functions.functionInvoker roles.

Management console
  1. Navigate to data-folder.
  2. In the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Name the service account, e.g., reddit-user.
  5. Click Add role and assign the datasphere.community-projects.editor and functions.functionInvoker roles to the service account.
  6. Click Create.

Add the service account to the projectAdd the service account to the project

To enable the service account to run a DataSphere project, add it to the list of project members.

Management console
  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. In the Members tab, click Add member.
  3. Select the reddit-user account and click Add.

Set up the projectSet up the project

To reduce DataSphere usage costs, configure the time to release the VM attached to the project.

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Navigate to the Settings tab.
  3. Under General settings, click  Edit.
  4. To configure Stop inactive VM after, select Custom and specify 5 minutes.
  5. Click Save.

Create a notebookCreate a notebook

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Click Open project in JupyterLab and wait for the loading to complete.
  3. In the top panel, click File and select New → Notebook.
  4. Select a kernel and click Select.
  5. Right-click the notebook and select Rename. Enter the name: test_classifier.ipynb.

Upload and process dataUpload and process data

To upload information on the most discussed stocks on Reddit and the sentiment of the discussion, paste the code to the test_classifier.ipynb notebook cells. You will use it to select the top three most discussed stocks and save them to a CSV file in the project storage.

  1. Open the DataSphere project:

    1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

    2. Click Open project in JupyterLab and wait for the loading to complete.
    3. Open the notebook tab.
  2. Import the libraries:

    import pandas as pd
    import requests
    import os.path
    
  3. Initialize the variables:

    REQUEST_URL = "https://tradestie.com/api/v1/apps/reddit"
    FILE_NAME = "/home/jupyter/datasphere/project/stock_sentiments_data.csv"
    TICKERS = ['NVDA', 'TSLA', 'AAPL']
    
  4. Create a function that sends a request to the Tradestie API and returns a response as pandas.DataFrame:

    def load_data():
        response = requests.get(REQUEST_URL)
        stocks = pd.DataFrame(response.json())
        stocks = stocks[stocks['ticker'].isin(TICKERS)]
        stocks.drop('sentiment', inplace=True, axis=1)
        return stocks
    
  5. Set the condition that defines a file to write stock information to:

    if os.path.isfile(FILE_NAME):
        stocks = pd.read_csv(FILE_NAME)
    else:
        stocks = load_data()
        stocks['count'] = 1
        stocks.to_csv(FILE_NAME, index=False)
    
  6. Upload the updated stock data:

    stocks_update = load_data()
    
  7. Compare the updated and existing data:

    stocks = stocks.merge(stocks_update, how='left', on = 'ticker')
    stocks['no_of_comments_y'] = stocks['no_of_comments_y'].fillna(stocks['no_of_comments_x'])
    stocks['sentiment_score_y'] = stocks['sentiment_score_y'].fillna(stocks['sentiment_score_y'])
    
  8. Update the arithmetic average count of comments and sentiment scores:

    stocks['count'] += 1
    stocks['no_of_comments_x'] += (stocks['no_of_comments_y'] - stocks['no_of_comments_x'])/stocks['count']
    stocks['sentiment_score_x'] += (stocks['sentiment_score_y'] - stocks['sentiment_score_x'])/stocks['count']
    stocks = stocks[['no_of_comments_x', 'sentiment_score_x', 'ticker', 'count']]
    stocks.columns = ['no_of_comments', 'sentiment_score', 'ticker', 'count']
    print(stocks)
    
  9. Save the results to a file:

    stocks.to_csv(FILE_NAME, index=False)
    

Create a function in Cloud FunctionsCreate a function in Cloud Functions

To run computations without opening JupyterLab, you will need a function in Cloud Functions that will trigger computations in a notebook via the API.

Management console
  1. In the management console, select the folder where you want to create a function.
  2. Select Cloud Functions.
  3. Click Create function.
  4. Name the function, e.g., my-function.
  5. Click Create function.

Create a Cloud Functions versionCreate a Cloud Functions version

Versions contain the function code, run parameters, and all required dependencies.

Management console
  1. In the management console, select the folder containing the function.

  2. Select Cloud Functions.

  3. Select the function whose version you want to create.

  4. Under Last version, click Сreate in editor.

  5. Select the Python runtime environment. Do not select Add files with code examples.

  6. Choose the Code editor method.

  7. Click Create file and specify a file name, e.g., index.

  8. Enter the function code by inserting your project ID and the absolute path to the project notebook:

    import requests
    
    def handler(event, context):
    
        url = 'https://datasphere.api.cloud.yandex.net/datasphere/v2/projects/<project_ID>:execute'
        body = {"notebookId": "/home/jupyter/datasphere/project/test_classifier.ipynb"}
        headers = {"Content-Type" : "application/json",
                   "Authorization": "Bearer {}".format(context.token['access_token'])}
        resp = requests.post(url, json = body, headers=headers)
    
        return {
        'body': resp.json(),
        }
    

    Where:

    • <project_ID>: ID of the DataSphere project displayed on the project page under its name.
    • notebookId: Absolute path to the project notebook.
  9. Under Parameters, set the version parameters:

    • Entry point: index.handler.
    • Service account: reddit-user.
  10. Click Save changes.

Create a timerCreate a timer

To run a function every 15 minutes, you will need a timer.

Management console
  1. In the management console, select the folder where you want to create a timer.

  2. Select Cloud Functions.

  3. In the left-hand panel, select Triggers.

  4. Click Create trigger.

  5. Under Basic settings:

    • Enter a name and description for the trigger.
    • In the Type field, select Timer.
    • In the Launched resource field, select Function.
  6. Under Timer settings, set the function call schedule to Every 15 minutes.

  7. Under Function settings, select a function and specify:

    • Function version tag.
    • reddit-user Service account to call the function.
  8. Click Create trigger.

From now on, the stock_sentiments_data.csv file will be updated every 15 minutes. You can find it next to the notebook.

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created:

  • Delete the function.
  • Delete the trigger.
  • Delete the project.

Was the article helpful?

Previous
Classification of images in video frames
Next
Running computations using the API
© 2025 Direct Cursus Technology L.L.C.