Yandex Data Processing release notes
Written by
Updated at November 18, 2024
September 2024
Metastore clusters are now part of Yandex MetaData Hub. For information on Metastore clusters, see the Yandex MetaData Hub documentation.
April 2024
A stable line of 2.1 images is available. With it, you can create a cluster with more recent Spark 3.3.2
Q2 2023
Creating Metastore clusters is now available. This feature is at the Preview stage.
Q3 2022
- Added support for new settings
in theDataprocCreateClusterOperator
Airflow operator. - Added
cpu-optimized
host classes with 2:1 GB RAM to vCPU ratio. The new configurations are only available for Intel Ice Lake. - Published a guide for using initialization scripts to set up GeeseFS.
Q2 2022
- Image version 2.1 available.
- Added the ability to enable public internet access for subclusters of all types.
- Lightweight Spark is available starting with image version 2.0.39. You can now create a cluster without data storage subclusters because YARN and SPARK services are no longer dependent on HDFS.
- Added support for initialization scripts in the CLI.
Q1 2022
- You can now create clusters on non-replicated network drives up to 8 TB. Non-replicated drives are much simpler than standard network SSD storage, which makes them perform several times faster.
- Added the ability to cancel a job.
- Added the build number in image version Yandex Data Processing.
- Added the ability to provide the
packages
,repositories
, andexclude_packages
parameters for Spark and PySpark jobs. By using these parameters, you can download additional dependencies and packages from external repositories.