Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • About Yandex DataSphere
    • DataSphere resource relationships
    • Communities
    • Cost management
    • Project
    • Computing resource configurations
      • Jobs
      • DataSphere CLI
      • Docker images in jobs
      • Job runtime environment
      • Rerunning jobs
      • Integration with Managed Service for Apache Airflow™
      • Working with Spark connectors
    • Foundation models
    • Quotas and limits
    • Special terms for educational institutions
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes
  1. Concepts
  2. DataSphere Jobs
  3. Rerunning jobs

Rerunning jobs in DataSphere Jobs

Written by
Yandex Cloud
Updated at January 23, 2025

In Yandex DataSphere, you can rerun a job with the required parameters redefined. A rerun creates a job fork and the original job becomes the parent one. A job fork can also be rerun, in which case the job will become both the fork of one and the parent of the other.

To regularly run the same job with some of its parameters redefined, you can use DataSphere Jobs along with Yandex Managed Service for Apache Airflow™.

To rerun a job with new parameters, there is the fork command in DataSphere CLI and in DataSphere Jobs SDK, which can be used to redefine the following parameters:

  • name: Job name.
  • desc: Job description.
  • args: Job arguments.
  • vars: Input and output data files.
  • env_vars: Environment variables.
  • working_storage: Extended working directory configuration.
  • cloud_instance_types: Computing resource configuration.

ExampleExample

Let's take a look at the config.yaml job configuration file for the code that runs a substring search (grep) in the input file:

name: simple-bash-script
desc: Find text pattern in input file with grep
cmd: grep -C ${RANGE} ${OPTIONS} -f ${PATTERN} ${INPUT} > ${OUTPUT}
args:
  RANGE: 0
  OPTIONS: "-h -r"
inputs:
  - pattern.txt: PATTERN
  - input.txt: INPUT
outputs:
  - output.txt: OUTPUT

Where:

  • RANGE: Search output interval.
  • OPTIONS: Additional flags of the grep command.
  • PATTERN: Substring pattern file.
  • INPUT: Input data file.
  • OUTPUT: Output data file.

After you run a job, you can get its ID from CLI logs using the execute command or under the DataSphere Jobs tab on the project page in your browser. To rerun this job using the SDK fork command, specify its ID and redefine its parameters as you need. For example, specify a new search output interval and a new input data file:

from datasphere import SDK

sdk = SDK()

sdk.fork_job(
  '<job_ID>',
  args={'RANGE': '1'},
  vars={'INPUT': 'new_input.txt'},
)

Job data lifetimeJob data lifetime

By default, job data is retained for 14 days. Once the data is deleted, you will not be able to re-run the job. You can change the job data lifetime by running the command below:

datasphere project job set-data-ttl --id <job_ID> --days <number_of_days>

Where:

--id: Job ID.
--days: Number of days after which the job data will be deleted.

See alsoSee also

  • DataSphere Jobs
  • Integration with Yandex Managed Service for Apache Airflow™
  • DataSphere CLI

Was the article helpful?

Previous
Job runtime environment
Next
Integration with Managed Service for Apache Airflow™
© 2025 Direct Cursus Technology L.L.C.