Running jobs in DataSphere Jobs
In DataSphere Jobs, you can remotely run jobs, i.e., Python and bash scripts, and executable binary files, on a Yandex DataSphere VM.
You create and run jobs within projects; however, they do not depend on the project notebooks and running VMs.
Before running a job, install and configure the Yandex Cloud CLI to use it for authentication in Yandex Cloud. You also need to install the datasphere library in your Python environment using the pip install datasphere command.
You can also work with jobs in Visual Studio Code
When you run a job, the datasphere library analyzes the environment, collects code dependencies, and can provide them to DataSphere for deploying the environment on a cloud VM. To avoid unnecessary system dependencies that can affect job performance, we recommend using a virtual environment, such as venv
Note
To run DataSphere jobs, use Python venv
Creating a job
-
Install the
dataspherelibrary:pip install datasphere -
Prepare a script or an executable binary file.
-
Prepare a file with inputs.
-
Configure the job settings. In the
config.yamlfile, specify the resources for running your job and its runtime configuration:name: <job_name> desc: <job_description> cmd: > python3 <executable_file> --data ${DATA} --result ${OUTPUT} env: python: auto inputs: - <inputs>: DATA outputs: - <results>: OUTPUT cloud-instance-types: - <computing_resource_configuration> - <computing_resource_configuration> - <computing_resource_configuration>Where:
name: Job name.desc: Job description.cmd: Script file and variables for inputs and outputs.env: Environment parameters.python: autoindicates that you need to provide the code andpipdependencies to DataSphere.inputs: File with inputs. You can change the name of theDATAvariable.outputs: File with outputs. You can change the name of theOUTPUTvariable.cloud-instance-types: List of valid computing resource configurations to run the job, sorted by priority.
For a single configuration, you may also use the old
cloud-instance-typefield, e.g.,cloud-instance-type: g1.1; however, it is better to use the new one. -
Open the command-line shell in the directory with the files you prepared and run your job:
datasphere project job execute -p <project_ID> -c config.yamlTo copy the project ID, select the project on the DataSphere homepage
and click ID .
Tracking job progress
-
Select the project in your community or on the DataSphere home page
in the Recent projects tab. - Navigate to DataSphere Jobs ⟶ Launch history and select the job you need.
- You will see its progress bar at the top of the page.
The progress is saved to a local file, job_progress.jsonl, available in the directory with job logs. The file receives periodic updates as JSON-formatted lines stating the current progress. Here is an example:
{"progress": 21, "message": "progress msg 21", "create_time": "2025-06-01T11:00:12+00:00"}
You can get the file path from the JOB_PROGRESS_FILENAME environment variable.
Example
Let's say you train a classification model on the MNIST dataset with handwritten digit samples. DataSphere remotely runs training and returns the trained model file as the result. For more job run examples, see this GitHub repository
Warning
To run a job, you need Python 3.10.0 and TensorFlow 2.12.0.
-
Install the
tensorflowlibrary:pip install tensorflow==2.12.0 -
Prepare a script in a file named
example.py:import argparse import json import os import shutil import tensorflow as tf parser = argparse.ArgumentParser(prog='example') parser.add_argument('-i', '--input', required=True, help='Input file') parser.add_argument('-m', '--model', required=True, help='Output file') def make_archive(source, destination): base = os.path.basename(destination) name = base.split(".")[0] fmt = base.split(".")[1] shutil.make_archive(name, fmt, source) def main(epoch_count, model_file): print("TensorFlow version: ", tf.__version__) print("") print(os.system("nvidia-smi")) print("") print("Load MNIST dataset...") mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 print("Build Sequential model...") model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation="relu"), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ]) #predictions = model(x_train[:1]).numpy() #tf.nn.softmax(predictions).numpy() loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) #loss_fn(y_train[:1], predictions).numpy() print("Compile model...") model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"]) print("Fit...") model.fit(x_train, y_train, epochs=epoch_count) print("Evaluate...") model.evaluate(x_test, y_test, verbose=2) print(f"Save model to '{model_file}'") tf.keras.models.save_model(model, "model", save_format="tf") make_archive("model", model_file) print("Done") if __name__ == "__main__": args = parser.parse_args() epoch_count = 5 with open(args.input) as f: data = json.load(f) epoch_count = int(data["epoch_count"]) main(epoch_count, args.model) -
Create a file with inputs named
input.json:{ "epoch_count" : 3 } -
Create a file named
config.yamlwith job settings:name: simple-tf-script desc: Simple TF script cmd: python3 example.py --input ${INPUT} --model ${MODEL} env: python: auto inputs: - input.json: INPUT outputs: - model.zip: MODEL cloud-instance-types: - g1.1 -
Run the job:
datasphere project job execute -p <project_ID> -c config.yamlTo copy the project ID, select the project on the DataSphere homepage
and click ID .
The system will save the model to the model.zip archive in the job folder.
See also
- DataSphere Jobs
- GitHub repository
with job run examples