Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Data Processing
  • Getting started
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

General questions about Yandex Data Processing

Written by
Yandex Cloud
Updated at April 24, 2025
  • What clusters can I move to a different availability zone?

  • What should I do if data on storage subcluster hosts is distributed unevenly?

  • Where can I view Yandex Data Processing cluster logs?

  • How do I get the logs of my actions in the services?

  • Why is the cluster slow even though the computing resources are not used fully?

  • I get the ^M: bad interpreter error when running the initialization script. How do I fix this?

  • When I run a PySpark job, I get an error related to com/amazonaws/auth/AWSCredentialsProvider. How do I fix this?

  • When using dynamic partition overwrites, I get an error related to PathOutputCommitProtocol. How do I fix it?

  • Why does the NAT should be enabled on the subnet error occur and how do I fix it?

  • Why does the Using fileUris is forbidden on lightweight cluster error occur and how do I fix it?

  • Why does the Create Yandex Data Processing cluster Error: 0 Address space exhausted error occur and how do I fix it?

  • Why is my cluster's status Unknown?

  • What is the minimum computing power required for a subcluster with a master host?

  • How do I upgrade the image version in Yandex Data Processing?

  • How do I run jobs?

  • What security group limits are there?

  • Can I get superuser permissions on hosts?

  • How can I fix the no permission error when connecting a service account to the cluster?

Which clusters can be moved to a different availability zone?Which clusters can be moved to a different availability zone?

You can move light-weight clusters and HDFS clusters.

What should I do if data on storage subcluster hosts is distributed unevenly?What should I do if data on storage subcluster hosts is distributed unevenly?

Connect to the cluster master host and run this command to rebalance the data:

sudo -u hdfs hdfs balancer

You can configure the load balancer parameters. For example, to change the maximum amount of data to transfer, add the following argument: -D dfs.balancer.max-size-to-move=<data-size-in-bytes>.

Where can I view Yandex Data Processing cluster logs?Where can I view Yandex Data Processing cluster logs?

You can find cluster logs in its log group. To track the events of a cluster and its individual hosts, specify the relevant log group in cluster settings when creating or updating the cluster. If no log group has been selected for the cluster, a default log group in the cluster directory will be used to send and store logs. For more information, see Working with logs.

Can I get the logs of what I do when I work with Yandex Cloud?Can I get the logs of what I do when I work with Yandex Cloud?

Yes, you can request information about operations with your resources from Yandex Cloud. For more information, see Data requests.

Why is the cluster slow even though the computing resources are not used fully?Why is the cluster slow even though the computing resources are not used fully?

Your storage may have insufficient maximum IOPS and bandwidth to process the current number of requests. In this case, throttling occurs, which degrades the entire cluster performance.

The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:

Disk type Step, GB Max IOPS increase (read/write) Max bandwidth increase (read/write), MB/s
network-hdd 256 300/300 30/30
network-ssd 32 1,000/1,000 15/15
network-ssd-nonreplicated, network-ssd-io-m3 93 28,000/5,600 110/82

To increase the maximum IOPS and bandwidth values and make throttling less likely, consider switching to a different cluster with larger host storage or a faster disk type. You can transfer data to a new cluster, for example, using Hive Metastore.

I get the "^M: bad interpreter" error when running the initialization script. How do I fix this?I get the "^M: bad interpreter" error when running the initialization script. How do I fix this?

The script runtime environment being Linux (Ubuntu), scripts created in Windows may terminate with the ^M: bad interpreter error due to using the CR/LF new line character (LF in Linux). To fix the error, save the script file in Linux format. For more information, see Syntax errors.

When I run a PySpark job, I get an error related to "com/amazonaws/auth/AWSCredentialsProvider". How do I fix this?When I run a PySpark job, I get an error related to "com/amazonaws/auth/AWSCredentialsProvider". How do I fix this?

If a Yandex Data Processing cluster is connected to a Metastore cluster, you may get the following error when running PySpark jobs:

previously initiated loading for a different type with name "com/amazonaws/auth/AWSCredentialsProvider";

To fix this, add the spark:spark.sql.hive.metastore.sharedPrefixes property with the com.amazonaws,ru.yandex.cloud value to the Yandex Data Processing cluster.

When using dynamic partition overwrites, I get an error related to "PathOutputCommitProtocol". How do I fix it?When using dynamic partition overwrites, I get an error related to "PathOutputCommitProtocol". How do I fix it?

When data processing uses dynamic partition overwrites, you may get this error:

py4j.protocol.Py4JJavaError: An error occurred while calling o264.parquet.
: java.io.IOException: PathOutputCommitProtocol does not support dynamicPartitionOverwrite

To fix it, add the following properties to the Yandex Data Processing cluster:

  • spark:spark.sql.sources.partitionOverwriteMode : dynamic
  • spark:spark.sql.parquet.output.committer.class : org.apache.parquet.hadoop.ParquetOutputCommitter
  • spark:spark.sql.sources.commitProtocolClass : org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol

You can also add properties when creating a job.

Why does the "NAT should be enabled on the subnet" error occur and how do I fix it?Why does the "NAT should be enabled on the subnet" error occur and how do I fix it?

This error occurs when trying to create a Yandex Data Processing cluster in a subnet with no NAT gateway configured. To fix it, configure a network for Yandex Data Processing.

Why does the "Using fileUris is forbidden on lightweight cluster" error occur and how do I fix it?Why does the "Using fileUris is forbidden on lightweight cluster" error occur and how do I fix it?

This error occurs because the lightweight clusters configuration does not include HDFS. To fix the error, create a cluster with HDFS support.

We also recommend using Yandex Object Storage buckets to work with jobs. You can upload scripts to them to run jobs. These scripts are stored as objects one can get links to. As a result, you can use links from Object Storage instead of file:/ format links in your jobs.

Why does the "Create Yandex Data Processing cluster Error: 0 Address space exhausted" error occur and how do I fix it?Why does the "Create Yandex Data Processing cluster Error: 0 Address space exhausted" error occur and how do I fix it?

The error means that your Yandex Data Processing cluster's subnet has run out of IPs that can be allocated to cluster hosts. To check how many IPs are available, view the list of addresses used in the subnet and its mask.

To fix the error, do one of the following:

  • Delete the unnecessary resources taking up the subnet's IPs.
  • Create a subnet with CIDR that suits your cluster's configuration. Next, create a Yandex Data Processing cluster in the new subnet.

For more information about subnet sizes, see the Yandex Virtual Private Cloud documentation.

Why is my cluster's status "Unknown"?Why is my cluster's status "Unknown"?

If your cluster's status changed from Alive to Unknown:

  1. Make sure you have set up a network for Yandex Data Processing. For a cluster to run, you need to create and configure the following network resources:

    • Network
    • Subnet
    • NAT gateway
    • Route table
    • Security group
    • Service account for the cluster
    • Bucket to store job dependencies and results
  2. Review the logs that describe the cluster status over the specified period:

    yc logging read \
       --group-id=<log_group_ID> \
       --resource-ids=<cluster_ID> \
       --filter=log_type=yandex-dataproc-agent \
       --since 'YYYY-MM-DDThh:mm:ssZ' \
       --until 'YYYY-MM-DDThh:mm:ssZ'
    

    In the --since and --until parameters, specify the period boundaries. Time format: YYYY-MM-DDThh:mm:ssZ, e.g., 2020-08-10T12:00:00Z. Use the UTC time zone.

    For more information, see Working with logs.

What is the minimum computing power required for a subcluster with a master host?What is the minimum computing power required for a subcluster with a master host?

It depends on the driver deploy mode:

  • In deployMode=cluster mode, when the driver is deployed on one of the cluster's compute hosts, 4-8 CPU cores and 16 GB RAM are sufficient for the subcluster with the master host.
  • In deployMode=client mode, when the driver is deployed on the cluster's master host, the computing power depends on the job logic and the number of concurrent jobs.

For more information about driver deploy modes and computing resource consumption, see Resource allocation.

In Yandex Cloud, computing power depends on the host class. For their ratio, see Host classes.

How do I upgrade the image version in Yandex Data Processing?How do I upgrade the image version in Yandex Data Processing?

The service has no built-in mechanism for image version upgrades. To upgrade your image version, create a new cluster.

To make sure the version you use is always up-to-date, automate the creation and removal of temporary Yandex Data Processing clusters using Yandex Managed Service for Apache Airflow™. To run jobs automatically, other than Managed Service for Apache Airflow™, you can also use Yandex DataSphere.

How do I run jobs?How do I run jobs?

There are several ways to do it:

  • Create jobs in Yandex Data Processing. Once created, they will run automatically.
  • Run Apache Hive jobs using the Yandex Cloud CLI or Hive CLI.
  • Run Spark or PySpark applications using Spark Shell, spark-submit, or the Yandex Cloud CLI.
  • Use spark-submit to run jobs from remote hosts that are not part of the Yandex Data Processing cluster.
  • Set up integration with Yandex Managed Service for Apache Airflow™ or Yandex DataSphere. This will automate running the jobs.

What security group limits are there?What security group limits are there?

You can create no more than five security groups per network. Each group may have a maximum of 50 rules. Learn more about limits in Yandex Virtual Private Cloud.

Can I get superuser permissions on hosts?Can I get superuser permissions on hosts?

Yes. To switch to superuser, enter the following command after connecting to the host:

  sudo su

However, you do not have to switch to superuser: just use sudo.

How can I fix the no permission error when connecting a service account to the cluster?How can I fix the no permission error when connecting a service account to the cluster?

Error message:

ERROR: rpc error: code = PermissionDenied desc = you do not have permission to access the requested service account or service account does not exist

This error occurs if you link a service account to a cluster while creating or modifying it.

Solution
Assign the iam.serviceAccounts.user role or higher to your Yandex Cloud account.

Was the article helpful?

Previous
Images
Yandex project
© 2025 Yandex.Cloud LLC