Job runtime environment
DataSphere Jobs jobs run on dedicated VMs that are not connected to JupyterLab or a launched project. The runtime environment of these VMs is deployed based on a Docker image and additional parameters that you define in the job configuration file.
Jobs allow running on a VM Python scripts, Bash scripts, and any binary files compiled for Linux x86_64.
Python environment
Tip
We recommend running jobs in a virtual Python environment
Automated build of a Python environment
Note
By default, DataSphere uses the conda package manager with pre-installed Python 3.10 to run jobs. To reduce environment migration time, use the same Python version for your jobs.
If you are running Python scripts, DataSphere can automatically set up the environment for your job. To do this, specify the python: auto
parameter in the env
section:
cmd: python3 main.py <call_arguments>
env:
python: auto
In the auto mode, DataSphere will analyze the job script dependencies, identify pip packages and local modules, transfer everything to the VM, install and prepare the runtime environment for the job.
You can modify an automated environment build:
cmd: python3 main.py <call_arguments>
env:
python:
type: auto # Specify automated environment build
root-path:
- other.py
pip:
extra-index-urls:
- https://pypi.ngc.nvidia.com
trusted-hosts:
- nvidia.com
no-deps: true # The default value is `false`
Where:
root-path
: Explicitly specifies additional entry points.extra-index-urls
: Specifies additional repository addresses the pip package manager can use to install the required environment packages.trusted-hosts
: List of trusted hosts which enables accessing the hosts specified as<host>:<port>
, even if they do not support HTTPS.no-deps
:pip install
command argument which prevents installing package dependencies.
Configuring a Python environment manually
When setting up a Python environment manually, you can explicitly specify the Python version, define dependencies and pip operation parameters through the requirements.txt
file, and provide local modules. When configuring an environment manually, you must specify at least one parameter. If you do not specify a parameter in the configuration file, its value will be set automatically.
env:
python:
type: manual # Specify manual environment configuration
version: 3.10.13 # Optional
requirements-file: requirements.txt # Optional
local-paths: # Optional, cannot be used together with `root-paths`
- foo.py
- lib/
Where:
version
: Python version. If omitted, the job's runtime environment version will be used.requirements-file
: Path to therequirements.txt
file listing all the packages and pip flags required for the job. If omitted, the list of dependencies will be formed automatically.local-paths
: List of local Python files to transfer. You can specify both individual files and whole directories. If omitted, the list of files will be formed automatically.
If the job consists of a single main Python script, specifylocal-paths: []
in theenv
section.
If the configuration file contains all three parameters (version
, requirements-file
, and local-paths
), DataSphere will not check the environment to identify missing dependencies. This can be of use if you cannot or do not want to reproduce the environment to run the job locally, as required by automated environment build.
Specifying entry points explicitly
Warning
You can use the if __name__ == "__main__":
standard statement
To run Python scripts in jobs, you can use one of the following methods:
- Run a script explicitly via
python3 main.py <arguments>
. - Use pre-configured third-party launchers, such as deepspeed
:deepspeed main.py --num_gpus=1 --deepspeed_stage 2 --apply_lora True
. - Provide programs as arguments when running other programs:
python3 main.py other.py
.
To build the environment and launch the job, DataSphere will need to identify all the program's entry points. If DataSphere is unable to do this in auto mode, specify them in the config.yaml
configuration file:
env:
python:
type: auto | manual # Both options are possible
root-paths: # Optional, cannot be used together with `local-paths`
- main.py
- other.py
Plain environment
By default, jobs use the plain environment to execute binary files and Bash scripts. In this case, all the files specified in the job configuration file under inputs
will be transferred to the VM.
Example
Running the following job will output the Linux kernel version, the list of all installed packages, and the list of files and folders in the VM home directory.
The config.yaml
configuration file specifies the entry point and lists all the modules you need to provide to the VM:
cmd: ./run.sh
inputs:
- run.sh # Explicitly list all the required modules
The run.sh
file with the job code lists the commands to run on the VM:
#!/bin/bash
uname -a
dpkg -l
ls -la
See also
- DataSphere Jobs
- DataSphere CLI
- Docker images in jobs
- Running jobs in DataSphere Jobs
- GitHub repository
with job run examples