Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • About Yandex DataSphere
    • DataSphere resource relationships
    • Communities
    • Cost management
    • Project
    • Computing resource configurations
      • Jobs
      • DataSphere CLI
      • Docker images in jobs
      • Job runtime environment
      • Rerunning jobs
      • Integration with Managed Service for Apache Airflow™
      • Working with Spark connectors
    • Foundation models
    • Quotas and limits
    • Special terms for educational institutions
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Python environment
  • Automated build of a Python environment
  • Configuring a Python environment manually
  • Specifying entry points explicitly
  • Plain environment
  • Example
  1. Concepts
  2. DataSphere Jobs
  3. Job runtime environment

Job runtime environment

Written by
Yandex Cloud
Updated at January 28, 2025
  • Python environment
    • Automated build of a Python environment
    • Configuring a Python environment manually
    • Specifying entry points explicitly
  • Plain environment
    • Example

DataSphere Jobs jobs run on dedicated VMs that are not connected to JupyterLab or a launched project. The runtime environment of these VMs is deployed based on a Docker image and additional parameters that you define in the job configuration file.

Jobs allow running on a VM Python scripts, Bash scripts, and any binary files compiled for Linux x86_64.

Python environmentPython environment

Tip

We recommend running jobs in a virtual Python environment to have precise control over job run parameters.

Automated build of a Python environmentAutomated build of a Python environment

Note

To run DataSphere jobs, use venv. The supported Python versions are 3.8–3.12.

If you are running Python scripts, DataSphere can automatically set up the environment for your job. To do this, specify the python: auto parameter in the env section:

cmd: python3 main.py <call_arguments>
env:
  python: auto

In the auto mode, DataSphere will analyze the job script dependencies, identify pip packages and local modules, transfer everything to the VM, install and prepare the runtime environment for the job.

You can modify an automated environment build:

cmd: python3 main.py <call_arguments>
env:
  python:
    type: auto # Specify automated environment build
    root-path:
      - other.py
    pip:
      index-url: https://pypi.org/simple
      extra-index-urls:
        - https://pypi.ngc.nvidia.com
      trusted-hosts:
        - nvidia.com
      no-deps: true  # The default value is `false`

Where:

  • root-path: Explicitly specifies additional entry points.
  • index-urls: Specifies the address of the main repository pip will use to install the required environment packages.
  • extra-index-urls: Specifies the addresses of additional repositories pip can use to install packages in case they are missing from the main repository.
  • trusted-hosts: List of trusted hosts which allows addressing the hosts specified as <host>:<port> even if they do not support HTTPS.
  • no-deps: pip install command argument which prevents installing package dependencies.

Configuring a Python environment manuallyConfiguring a Python environment manually

When setting up a Python environment manually, you can explicitly specify the Python version, define dependencies and pip operation parameters through the requirements.txt file, and provide local modules. When configuring an environment manually, you must specify at least one parameter. If you do not specify a parameter in the configuration file, its value will be set automatically.

env:
 python:
   type: manual     # Specify manual environment configuration
   version: 3.10.13 # optional
   pip:
      index-url: https://pypi.org/simple
      extra-index-urls:
        - https://pypi.ngc.nvidia.com
      trusted-hosts:
        - nvidia.com
      no-deps: true  # The default value is `false`
   requirements-file: requirements.txt  # optional
   root-path:
      - other.py
   local-paths:     # Optional, cannot be used together with `root-paths`
     - foo.py
     - lib/

Where:

  • version: Python version. If omitted, the job's runtime environment version will be used.
  • index-urls: Specifies the address of the main repository pip will use to install the required environment packages.
  • extra-index-urls: Specifies the addresses of additional repositories pip can use to install packages in case they are missing from the main repository.
  • trusted-hosts: List of trusted hosts which allows addressing the hosts specified as <host>:<port> even if they do not support HTTPS.
  • no-deps: pip install command argument which prevents installing package dependencies.
  • requirements-file: Path to the requirements.txt file listing all the packages and pip flags required for the job. If omitted, the list of dependencies will be formed automatically.
  • root-path: Explicitly specifies additional entry points.
  • local-paths: List of local Python files to transfer. You can specify both individual files and whole directories. If omitted, the list of files will be formed automatically.
    If the job consists of a single main Python script, specify local-paths: [] in the env section.

If the configuration file contains the version, requirements-file, and local-paths parameters, DataSphere will not check the environment to identify missing dependencies. This can be of use if you cannot or do not want to reproduce the environment to run the job locally, as required by automated environment build.

Specifying entry points explicitlySpecifying entry points explicitly

Warning

You can use the if __name__ == "__main__": standard statement in your programs for all entry points.

To run Python scripts in jobs, you can use one of the following methods:

  • Run a script explicitly via python3 main.py <arguments>.
  • Use pre-configured third-party launchers, such as deepspeed: deepspeed main.py --num_gpus=1 --deepspeed_stage 2 --apply_lora True.
  • Provide programs as arguments when running other programs: python3 main.py other.py.

To build the environment and launch the job, DataSphere will need to identify all the program's entry points. If DataSphere is unable to do this in auto mode, specify them in the config.yaml configuration file:

env:
  python:
    type: auto | manual   # Both options are possible
    root-paths:           # Optional, cannot be used together with `local-paths`
      - main.py
      - other.py

Plain environmentPlain environment

By default, jobs use the plain environment to execute binary files and Bash scripts. In this case, all the files specified in the job configuration file under inputs will be transferred to the VM.

ExampleExample

Running the following job will output the Linux kernel version, the list of all installed packages, and the list of files and folders in the VM home directory.

The config.yaml configuration file specifies the entry point and lists all the modules you need to provide to the VM:

cmd: ./run.sh
inputs:
  - run.sh  # Explicitly list all the required modules

The run.sh file with the job code lists the commands to run on the VM:

#!/bin/bash
uname -a
dpkg -l
ls -la

See alsoSee also

  • DataSphere Jobs
  • DataSphere CLI
  • Docker images in jobs
  • Running jobs in DataSphere Jobs
  • GitHub repository with job run examples

Was the article helpful?

Previous
Docker images in jobs
Next
Rerunning jobs
© 2025 Direct Cursus Technology L.L.C.