Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • All guides
      • Authentication in DataSphere Jobs
      • Running a job in DataSphere Jobs
      • Working with jobs from Visual Studio Code
      • Using results of completed jobs
      • Working with a service agent from a job
    • Migrating a workflow to a new version
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Creating a job
  • Tracking job progress
  • Example
  1. Step-by-step guides
  2. DataSphere Jobs
  3. Running a job in DataSphere Jobs

Running jobs in DataSphere Jobs

Written by
Yandex Cloud
Updated at August 15, 2025
  • Creating a job
  • Tracking job progress
  • Example

In DataSphere Jobs, you can remotely run jobs, i.e., Python and bash scripts, and executable binary files, on a Yandex DataSphere VM.

You create and run jobs within projects; however, they do not depend on the project notebooks and running VMs.

Before running a job, install and configure the Yandex Cloud CLI to use it for authentication in Yandex Cloud. You also need to install the datasphere library in your Python environment using the pip install datasphere command.

You can also work with jobs in Visual Studio Code using DataSphere Jobs Toolkit.

When you run a job, the datasphere library analyzes the environment, collects code dependencies, and can provide them to DataSphere for deploying the environment on a cloud VM. To avoid unnecessary system dependencies that can affect job performance, we recommend using a virtual environment, such as venv or conda.

Note

To run DataSphere jobs, use Python venv. The supported Python versions are 3.8 to 3.12.

Creating a jobCreating a job

  1. Install the datasphere library:

    pip install datasphere
    
  2. Prepare a script or an executable binary file.

  3. Prepare a file with inputs.

  4. Configure the job settings. In the config.yaml file, specify the resources for running your job and its runtime configuration:

    name: <job_name>
    desc: <job_description>
    cmd: >
        python3 <executable_file> --data ${DATA} --result ${OUTPUT}
    env:
      python: auto
    inputs:
      - <inputs>: DATA
    outputs:
      - <results>: OUTPUT
    cloud-instance-types:
      - <computing_resource_configuration>
      - <computing_resource_configuration>
      - <computing_resource_configuration>
    

    Where:

    • name: Job name.
    • desc: Job description.
    • cmd: Script file and variables for inputs and outputs.
    • env: Environment parameters. python: auto indicates that you need to provide the code and pip dependencies to DataSphere.
    • inputs: File with inputs. You can change the name of the DATA variable.
    • outputs: File with outputs. You can change the name of the OUTPUT variable.
    • cloud-instance-types: List of valid computing resource configurations to run the job, sorted by priority.

    For a single configuration, you may also use the old cloud-instance-type field, e.g., cloud-instance-type: g1.1; however, it is better to use the new one.

  5. Open the command-line shell in the directory with the files you prepared and run your job:

    datasphere project job execute -p <project_ID> -c config.yaml
    

    To copy the project ID, select the project on the DataSphere homepage and click ID .

Tracking job progressTracking job progress

Yandex DataSphere interface
Locally
  1. Select the project in your community or on the DataSphere home page in the Recent projects tab.

  2. Navigate to DataSphere Jobs ⟶ Launch history and select the job you need.
  3. You will see its progress bar at the top of the page.

The progress is saved to a local file, job_progress.jsonl, available in the directory with job logs. The file receives periodic updates as JSON-formatted lines stating the current progress. Here is an example:

{"progress": 21, "message": "progress msg 21", "create_time": "2025-06-01T11:00:12+00:00"}

You can get the file path from the JOB_PROGRESS_FILENAME environment variable.

ExampleExample

Let's say you train a classification model on the MNIST dataset with handwritten digit samples. DataSphere remotely runs training and returns the trained model file as the result. For more job run examples, see this GitHub repository.

Warning

To run a job, you need Python 3.10.0 and TensorFlow 2.12.0.

  1. Install the tensorflow library:

    pip install tensorflow==2.12.0
    
  2. Prepare a script in a file named example.py:

    import argparse
    import json
    import os
    import shutil
    import tensorflow as tf
    
    parser = argparse.ArgumentParser(prog='example')
    parser.add_argument('-i', '--input', required=True, help='Input file')
    parser.add_argument('-m', '--model', required=True, help='Output file')
    
    def make_archive(source, destination):
        base = os.path.basename(destination)
        name = base.split(".")[0]
        fmt = base.split(".")[1]
        shutil.make_archive(name, fmt, source)
    
    def main(epoch_count, model_file):
        print("TensorFlow version: ", tf.__version__)
        print("")
        print(os.system("nvidia-smi"))
        print("")
    
        print("Load MNIST dataset...")
        mnist = tf.keras.datasets.mnist
        (x_train, y_train), (x_test, y_test) = mnist.load_data()
        x_train, x_test = x_train / 255.0, x_test / 255.0
    
        print("Build Sequential model...")
        model = tf.keras.models.Sequential([
          tf.keras.layers.Flatten(input_shape=(28, 28)),
          tf.keras.layers.Dense(128, activation="relu"),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Dense(10)
        ])
    
        #predictions = model(x_train[:1]).numpy()
        #tf.nn.softmax(predictions).numpy()
    
        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
        #loss_fn(y_train[:1], predictions).numpy()
    
        print("Compile model...")
        model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
    
        print("Fit...")
        model.fit(x_train, y_train, epochs=epoch_count)
    
        print("Evaluate...")
        model.evaluate(x_test,  y_test, verbose=2)
    
        print(f"Save model to '{model_file}'")
        tf.keras.models.save_model(model, "model", save_format="tf")
        make_archive("model", model_file)
    
        print("Done")
    
    
    if __name__ == "__main__":
        args = parser.parse_args()
    
        epoch_count = 5
    
        with open(args.input) as f:
            data = json.load(f)
            epoch_count = int(data["epoch_count"])
    
        main(epoch_count, args.model)
    
  3. Create a file with inputs named input.json:

    {
        "epoch_count" : 3
    }
    
  4. Create a file named config.yaml with job settings:

    name: simple-tf-script
    desc: Simple TF script
    cmd: python3 example.py --input ${INPUT} --model ${MODEL}
    env:
      python: auto
    inputs:
      - input.json: INPUT
    outputs:
      - model.zip: MODEL
    cloud-instance-types:
      - g1.1
    
  5. Run the job:

    datasphere project job execute -p <project_ID> -c config.yaml
    

    To copy the project ID, select the project on the DataSphere homepage and click ID .

The system will save the model to the model.zip archive in the job folder.

See alsoSee also

  • DataSphere Jobs
  • GitHub repository with job run examples

Was the article helpful?

Previous
Authentication in DataSphere Jobs
Next
Working with jobs from Visual Studio Code
© 2025 Direct Cursus Technology L.L.C.