Yandex Data Processing release notes
Written by
Updated at October 23, 2025
Q3 2025
In image 2.2.9 (beta), Apache Spark™ is updated to version 3.5.6.
Q2 2025
- Added the OS Login option to use when creating a cluster. This option enables OS Login access to all hosts you create in the cluster.
-
Added support for environment variables:
HADOOP_HEAPSIZE_MINandHADOOP_HEAPSIZE_MAXfor thehadoopservice:hadoop.env:HADOOP_HEAPSIZE_MINhadoop.env:HADOOP_HEAPSIZE_MAX
HADOOP_HEAPSIZEforhive(available only for 2.0 images):hive.env:HADOOP_HEAPSIZE.
Q1 2025
In 2.2.X images, Java version updated to 11.
Q4 2024
- Added environment selection (
PRODUCTION/PRESTABLE) during cluster creation and modification. - In 2.2.X images, Python version updated to 3.1.
Q3 2024
- Apache Hive™ Metastore cluster functionality has been integrated in Yandex MetaData Hub. For more information about Apache Hive™ Metastore clusters, see the Yandex MetaData Hub documentation.
- In 2.1.X and 2.2.X images, Conda now uses Mamba
as its default solver.
Q2 2024
A stable image version line 2.1 is now available. This update enables cluster creation with newer runtime versions: Spark 3.3.2
Q2 2023
Added support for creating Apache Hive™ Metastore clusters. This feature is currently in Preview.
Q3 2022
- Added support for new configuration settings
in theDataprocCreateClusterOperatorAirflow operator. - Added
cpu-optimizedhost classes configured with 2GB RAM per 1 vCPU core. The new configurations are exclusively available for Intel Ice Lake processors. - Published a guide for using initialization scripts to configure GeeseFS.
Q2 2022
- Image version 2.1 is now available.
- Added support for public internet access across all subcluster types.
- Lightweight Spark support is now available starting with image version 2.0.39. You can now create a cluster without data storage subclusters because YARN and SPARK services are no longer dependent on HDFS.
- Added support for initialization scripts in the CLI.
Q1 2022
- You can now create clusters using non-replicated network drives up to 8 TB in size. Non-replicated drives have a simpler architecture than network SSD storage, resulting in significantly higher performance.
- Added support for job cancellation.
- Added the build number in Yandex Data Processing image version.
- Spark and PySpark jobs now accept
packages,repositories, andexclude_packagesparameters. You can use these parameters to download additional dependencies and packages from third-party repositories.