Yandex Data Processing release notes
Written by
Updated at April 10, 2025
December 2024
When creating or editing a cluster, you can now select the environment: PRODUCTION
or PRESTABLE
.
September 2024
Metastore clusters are now part of Yandex MetaData Hub. For information on Metastore clusters, see the Yandex MetaData Hub documentation.
April 2024
A stable line of 2.1 images is available. With it, you can create a cluster with more recent Spark 3.3.2
Q2 2023
Creating Metastore clusters is now available. This feature is at the Preview stage.
Q3 2022
- Added support for new settings
in theDataprocCreateClusterOperator
Airflow operator. - Added
cpu-optimized
host classes with 2:1 GB RAM to vCPU ratio. The new configurations are only available for Intel Ice Lake. - Published a guide for using initialization scripts to set up GeeseFS.
Q2 2022
- Image version 2.1 available.
- Added the ability to enable public internet access for subclusters of all types.
- Lightweight Spark is available starting with image version 2.0.39. You can now create a cluster without data storage subclusters because YARN and SPARK services are no longer dependent on HDFS.
- Added support for initialization scripts in the CLI.
Q1 2022
- You can now create clusters on non-replicated network drives up to 8 TB. Non-replicated drives are much simpler than standard network SSD storage, which makes them perform several times faster.
- Added the ability to cancel a job.
- Added the build number in image version Yandex Data Processing.
- Added the ability to provide the
packages
,repositories
, andexclude_packages
parameters for Spark and PySpark jobs. By using these parameters, you can download additional dependencies and packages from external repositories.