Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML Services
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Tutorials
    • All tutorials
      • DataSphere integration with Yandex Data Processing
      • Classification of images in video frames
      • Running computations on a schedule in DataSphere
      • Running computations in DataSphere using the API
      • Using data from Object Storage to train a model in DataSphere
      • Creating an MLFlow server for logging experiments and artifacts
      • Model fine-tuning in DataSphere

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder
  • Create a service account
  • Create a static access key for the service account
  • Create an egress NAT gateway
  • Create an S3 bucket
  • Configure DataSphere
  • Create a project
  • Edit the project settings
  • Create secrets
  • Prepare notebooks
  • Install dependencies
  • Upload and label the data
  • Prepare the ML model and calculate the features
  • Train the classifier using the features you calculated
  • Get the results of feature prediction based on test data
  • Possible model application
  • How to delete the resources you created
  1. Machine learning and artificial intelligence
  2. Development with DataSphere
  3. Classification of images in video frames

Classification of images in video frames

Written by
Yandex Cloud
Updated at August 21, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder
    • Create a service account
    • Create a static access key for the service account
    • Create an egress NAT gateway
    • Create an S3 bucket
  • Configure DataSphere
    • Create a project
    • Edit the project settings
  • Create secrets
  • Prepare notebooks
  • Install dependencies
  • Upload and label the data
  • Prepare the ML model and calculate the features
  • Train the classifier using the features you calculated
  • Get the results of feature prediction based on test data
  • Possible model application
  • How to delete the resources you created

Yandex DataSphere allows you to build ML models using the Jupyter Notebook interface.

This tutorial solves the problem of binary image classification. You may have to deal with it when recognizing vehicle types based on CCTV camera images. The CCTV system is assumed to capture images from cameras when detecting motion and save them to a bucket in Yandex Object Storage.

To get an idea of how to solve the problem:

  1. Set up your infrastructure.
  2. Configure DataSphere.
  3. Create secrets.
  4. Prepare notebooks.
  5. Install the dependencies.
  6. Upload and label the data.
  7. Prepare the ML model and calculate the features.
  8. Train the classifier using the features you calculated.
  9. Predict the feature for the test image.
  10. View possible applications of the model.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Identity Hub organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resourcesRequired paid resources

The model operation cost includes:

  • Fee for bucket usage (see Yandex Object Storage pricing).
  • Fee for computing resource usage (see Yandex DataSphere pricing).

Set up your infrastructureSet up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can go to the cloud page to create or select a folder to run your infrastructure.

Note

If you are using an identity federation to work with Yandex Cloud, you might not have access to billing details. In this case, contact your Yandex Cloud organization administrator.

Create a folderCreate a folder

Create a folder and network with subnets in each availability zone.

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Name your folder, e.g., data-folder.
  3. Click Create.

Create a service accountCreate a service account

Management console
  1. Navigate to data-folder.

  2. In the list of services, select Identity and Access Management.

  3. Click Create service account.

  4. Name the service account, e.g., sa-for-project.

  5. Click Add role and assign the following roles to the service account:

    • storage.viewer to read data from the Object Storage bucket.
    • vpc.gateways.user to use the subnet.
  6. Click Create.

Create a static access key for the service accountCreate a static access key for the service account

To allow your service account to get authenticated in Object Storage, create a static access key.

Management console
  1. Navigate to data-folder.
  2. In the list of services, select Identity and Access Management.
  3. In the left-hand panel, select Service accounts.
  4. In the list that opens, select the sa-for-project service account.
  5. Click Create new key in the top panel.
  6. Select Create static access key.
  7. Enter a description of the key so that you can easily find it in the management console.
  8. Save the ID and secret key. The secret key is not saved in Yandex Cloud, so you will not be able to view it in the management console.

Create an egress NAT gatewayCreate an egress NAT gateway

Management console
  1. In data-folder, select Virtual Private Cloud.
  2. In the left-hand panel, select Gateways.
  3. Click Create and set the gateway parameters:
    • Name the gateway, e.g., nat-for-cluster.
    • Gateway Type: Egress NAT.
    • Click Save.

Create a route table:

  1. In the left-hand panel, select Routing tables.
  2. Click Create and specify the route table parameters:
    1. Enter a name, e.g., route-table.
    2. Select the data-folder network.
    3. Click Add.
      • In the window that opens, select Gateway in the Next hop field.
      • In the Gateway field, select the NAT gateway you created. The destination prefix will apply automatically.
      • Click Add.
  3. Click Create routing table.

Associate the route table with a subnet to route traffic from it through the NAT gateway:

  1. In the left-hand panel, select Subnets.
  2. In the row with the subnet, click .
  3. In the menu that opens, select Link routing table.
  4. In the window that opens, select your route table from the list.
  5. Click Link.

Create an S3 bucketCreate an S3 bucket

Management console
  1. Navigate to data-folder.
  2. In the list of services, select Object Storage.
  3. Click Create bucket.
  4. On the bucket creation page:
    1. Enter a name for the bucket as per the naming requirements.

      Warning

      Do not use buckets with a dot in their name for connection.

    2. In the Object read access, Object listing access, and Read access to settings fields, select Restricted.

    3. Limit the maximum bucket size, if required.

  5. Click Create bucket to complete the operation.

Configure DataSphereConfigure DataSphere

Create a projectCreate a project

  1. Open the DataSphere home page.
  2. In the left-hand panel, select Communities.
  3. Select the community where you want to create a project.
  4. On the community page, click Create project.
  5. In the window that opens, enter a name for the project. You can also add a description as needed.
  6. Click Create.

Edit the project settingsEdit the project settings

  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Navigate to the Settings tab.

  3. Under Advanced settings, click Edit.

  4. Specify the parameters:

    • Default folder: data-folder.
    • Service account: sa-for-project.
    • Subnet: Subnet of the ru-central1-a availability zone in data-folder.

    Note

    If you specified a subnet in the project settings, the time to allocate computing resources may be increased.

    • Security group: Specify a security group, if used in your organization.
  5. Click Save.

Create secretsCreate secrets

Create secrets to store the ID and secret part of the static access key:

  1. Under Project resources on the project page, click Secret.
  2. Click Create.
  3. In the Name field, enter a name for the secret. Name the secret with the static key ID as token, and the secret with the secret part as key_value.
  4. In the Value field, enter a value to store in encrypted form.
  5. Click Create. You will see a page with detailed info on the secret you created.

Prepare notebooksPrepare notebooks

Clone the Git repository containing the notebooks with the examples of the ML model training and testing:

  1. In the top menu, click Git and select Clone.
  2. In the window that opens, enter https://github.com/yandex-cloud-examples/yc-datasphere-video-recognition.git and click Clone.

Wait until cloning is complete. It may take some time. You will see the cloned repository folder in the File Browser section.

There are two notebooks in the repository:

  • model-building.ipynb: To set up the environment and train your model using the ResNet50 convolutional neural network (CNN).
  • model-testing.ipynb: To test your model.

Install dependenciesInstall dependencies

Note

In this example, the model is trained and tested using the g1.1 computing resource configuration. You can use a different configuration with a GPU. To do this, change the configuration in the code in all notebook cells as needed.

  1. Open the ML folder and then the model-building.ipynb notebook.

  2. Click the first cell to select it:

    #!g1.1
    %matplotlib inline
    import matplotlib
    import matplotlib.pyplot as plt
    import os
    import io
    from os import path
    ...
    
  3. To run the cell, click or press Shift + Enter.

  4. Wait for the operation to complete.

The solution uses the Keras interface of the TensorFlow library with a CNTK backend. To connect to the bucket with images, you need the boto3 package. The cell also sets the environment variables to access the CNTK backend and connect to the bucket.

The packages listed in the cell are already installed in DataSphere and you can import them using the import command. For the full list of packages pre-installed in DataSphere, see List of pre-installed software.

Upload and label the dataUpload and label the data

Go to the Connect S3 section to:

  1. Set up a connection to the S3 bucket.
  2. Load the list of objects, i.e., images of cars and buses to train the model.
  3. Define the function to extract an image using a key (name).

The next section, Labeling, is used for labeling data:

  • Images are labeled according to the key value (folder name).
  • Bus images are labeled with 0, and car images are labeled with 1.

To upload and label the data:

  1. In the first cell, replace the bucket_name variable value with the name of your bucket. The default value is bucketwithvideo.

  2. Select all the cells containing code in the Connect S3 and Labeling sections by holding Shift and clicking to the left of the cells in question:

    #!g1.1
    
    session = boto3.session.Session()
    ...
    
  3. Run the selected cells.

  4. Wait for the operation to complete. When the operation is complete, one of the images is displayed to check if the data was uploaded and labeled correctly.

Prepare the ML model and calculate the featuresPrepare the ML model and calculate the features

Go to the Calculating the characteristics section to:

  1. Load the ResNet50 model from the Keras package, with weights pretrained on the ImageNet dataset. This dataset contains 1.2 million images in 1,000 categories.
  2. Define the featurize_images function. It splits the list of images into chunks of 32 images each, scales the images down to 224×224 pixels, and converts them to a four-dimensional tensor to feed to the Keras model. Next, the function calculates their features and returns them as a NumPy array.
  3. Use the function to calculate binary features (1 means a car, 0 means other) and save them to a file. This step may take 10-15 minutes. Learn more about the ResNet50 model.

To prepare the model and calculate the features:

  1. Select all the cells containing code in the Calculating the characteristics section:

    #!g1.1
    model = ResNet50(weights='imagenet',  input_shape=(224, 224, 3))
    ...
    
  2. Run the selected cells.

  3. Wait for the operation to complete.

Train the classifier using the features you calculatedTrain the classifier using the features you calculated

In this section, you will train the LightGBM classifier on the features calculated in the previous section. To evaluate the model, you will use cross-validation.

The K-fold method of the scikit-learn package is used. Parts of the training sample (folds) are created based on the specified percentage of data from each class.

This is important when your data contains much fewer images of one category than another. This example uses five-fold cross-validation. You can set a different number of folds in the n_splits parameter.

Go to the Training and cross-validation and Saving the model sections to:

  1. Define an object for cross-validation of training results using the K-fold method.
  2. Prepare a table to store the classification quality metrics.
  3. Define the classification_metrics function for calculating the selected metrics.
  4. Start the LightGBM classifier training. This example uses five-fold cross-validation:
    1. The training sample is divided into five equal non-overlapping folds.
    2. Five iterations take place. At each one, the following steps are performed:
      1. The model is trained on four folds from the sample.
      2. The model is tested on the sample fold not used for training.
      3. The selected quality metrics are output.
  5. Train the classifier on the full dataset and output the resulting error matrix.

To train the classifier, run all the cells in the Training and cross-validation and Saving the model sections one by one.

The model thus trained will be saved to a separate file.

Get the results of feature prediction based on test dataGet the results of feature prediction based on test data

To use the resulting model:

  1. Open the ML folder and then the model-testing.ipynb notebook.

    This notebook uses the previously trained LightGBM classifier to prepare the entities required to illustrate how you can use your model.

    Note

    You need much less resources to use your model than to train it; therefore, the minimum c1.4 configuration is used by default.

  2. In the second cell, replace the bucket_name variable value with the name of your bucket. The default value is bucketwithvideo.

  3. Run the first three cells. Use these cells to:

    1. Import the packages you need for the test.
    2. Identify the name of the bucket to store the images.
    3. Set up a connection to the bucket with CCTV images.
  4. Specify a test image featuring a car:

    test_image = 'car/electric-cars-17.jpeg'
    

    Note

    If you run the cell again, specify a new image.

  5. In the cell from the Prediction section, load the ResNet50 model and the prepared LightGBM classifier and calculate the probability of the predicted binary feature value (1 means a car).

    It takes longer to process the cell with prediction calculation for the first time, because the models are loaded into the memory. The subsequent runs will be faster:

    %%time
    clf = lgb.Booster(model_file='ImageClassificationML/lightgbm_classifier.model')
    model = ResNet50(weights='imagenet',  input_shape=(224, 224, 3))
    ...
    
  6. Make sure that the probability value is close to 1 (you should get ≈0.98).

  7. Edit the cell code before loading the model:

    test_image = 'bus/electric_bus-183.jpeg'
    

    This is a test image with no car in it.

  8. Run the cell.

  9. Repeat the probability calculation and make sure that the value is much less than 0.5.

This means the classifier can successfully predict the feature for both images.

Note

You can share your finished notebook of computations or export the whole project.

Possible model applicationPossible model application

There are several possible applications for your model:

  • The code of the solution allows running a web service using Yandex Cloud Functions and analyze on-event CCTV content.
  • For parallel processing of images collected from multiple CCTV cameras into an S3 bucket, you can upload the code to an Apache Spark™ cluster in Yandex Data Processing using the PySpark package.

How to delete the resources you createdHow to delete the resources you created

Some resources are not free of charge. To shut down the model and stop paying for the resources you created, delete those you no longer need:

  1. Delete all objects from the bucket.
  2. Delete the bucket.
  3. Delete the project.
  4. Delete the route table.
  5. Delete the NAT gateway.

Was the article helpful?

Previous
DataSphere integration with Yandex Data Processing
Next
Running computations on a schedule in DataSphere
© 2025 Direct Cursus Technology L.L.C.