Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Data Processing
  • Getting started
    • Resource relationships
    • Runtime environment
    • Yandex Data Processing component interfaces and ports
    • Jobs in Yandex Data Processing
    • Spark jobs
    • Automatic scaling
    • Decommissioning subclusters and hosts
    • Networking in Yandex Data Processing
    • Maintenance
    • Quotas and limits
    • Storage in Yandex Data Processing
    • Component properties
    • Apache Iceberg™ in Yandex Data Processing
    • Delta Lake in Yandex Data Processing
    • Logs in Yandex Data Processing
    • Initialization scripts
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

In this article:

  • Environment variables
  • Script initialization errors
  • Syntax errors
  1. Concepts
  2. Initialization scripts

Initialization scripts

Written by
Yandex Cloud
Updated at February 18, 2025
  • Environment variables
  • Script initialization errors
    • Syntax errors

When creating a cluster, you can specify host initialization scripts. This can be useful to automatically install or update software you need to run jobs. Each script will be run under the root superuser only once, when the host starts for the first time.

In the first line of the script file, specify the full path to the interpreter, e.g., #!/bin/sh or #!/usr/bin/python.

You can specify the script URI as https://, http://, hdfs://, or s3a://. For s3a://, at least one of the following conditions must be met:

  • The bucket ACL must allow a cluster service account to perform read operations.
  • The cluster's service account must have the storage.viewer role.
  • The access to the bucket must be public.

Environment variablesEnvironment variables

You can use these environment variables in your initialization scripts:

  • CLUSTER_ID: Cluster ID.
  • S3_BUCKET: Name of the linked Yandex Object Storage bucket.
  • ROLE: Host role (masternode, computenode, or datanode).
  • CLUSTER_SERVICES: List of components.
  • MAX_WORKER_COUNT: Maximum number of hosts in data storage and processing subclusters.
  • MIN_WORKER_COUNT: Minimum number of hosts in data storage and processing subclusters.

For example, to run a part of a script only on the master host (masternode), check the value of the ROLE environment variable:

if [[ "${ROLE}" == "masternode" ]]; then
   ...
fi

Script initialization errorsScript initialization errors

If the script fails and the cluster switches to DEAD:

  1. View logs in Yandex Cloud Logging or on cluster hosts in the /var/log/yandex/dataproc-init-actions.log file.
  2. Correct the error.
  3. Delete this cluster and create a new one.

If the initialization script returns an error on an existing cluster (such as when adding a subcluster) and recreating the cluster disrupts your workflow, you can fix the error manually:

  1. Connect to the problematic host and perform the steps required to resolve the issue.

  2. Run the script that marks the initialization script execution results as successful:

    sudo /opt/yandex/complete_init_action.py
    
  3. Check the initialization script results in the /home/dataproc-agent/dataproc-init-acts/states.json file on the master host.

Syntax errorsSyntax errors

To check a script for syntax errors, download the script file manually and run it:

  1. Connect to the cluster host.

  2. Download the script file from the storage via the link used when creating the cluster. Here is an example:

    wget <HTTP_link_to_script_file>
    
  3. Run the script.

If any error occurs during the script run, you will see an error message in the console.

For instance, an error may occur because of incompatible formats. The script runtime environment being Linux (Ubuntu), scripts created in Windows may terminate with one of the following errors:

  • ^M: bad interpreter

  • FileNotFoundError: [Errno 2] No such file or directory: '<executable_file_name>'.

These errors are caused by using the CR/LF line break character in Windows (LF in Linux). To fix the error, run this command:

Bash
PowerShell
sed -i -e 's/\r$//' <script_file_name>
$file = "<script_file_name>"; $text = [IO.File]::ReadAllText($file) -replace "`r`n", "`n"; [IO.File]::WriteAllText($file, $text)

Was the article helpful?

Previous
Logs in Yandex Data Processing
Next
Access management
Yandex project
© 2025 Yandex.Cloud LLC