Yandex Data Processing release notes
Written by
Updated at October 14, 2024
Labels next to the revision description let you see in what interface it is supported: the management console, CLI, API, Terraform, or SQL.
September 2024
Metastore clusters are now part of Yandex MetaData Hub. For information on Metastore clusters, see the Yandex MetaData Hub documentation.
April 2024
A stable line of 2.1 images is available. With it, you can create a cluster with more recent Spark 3.3.2
Q2 2023
Сreating Metastore clusters is now available. This feature is at the Preview stage.
Q3 2022
- Added support for new settings
in theDataprocCreateClusterOperator
Airflow operator. - Added
cpu-optimized
host classes with 2:1 GB RAM to vCPU ratio. The new configurations are only available for Intel Ice Lake. - Published a guide for using initialization scripts to set up GeeseFS.
Q2 2022
- Image version 2.1 available.
- Added the ability to enable public internet access for subclusters of all types.
Management console
CLI
API
- Lightweight Spark is available starting with image version 2.0.39. You can now create a cluster without data storage subclusters because YARN and SPARK services are no longer dependent on HDFS.
- Added support for initialization scripts in the CLI.
CLI
Q1 2022
- You can now create clusters on non-replicated network drives up to 8 TB. Non-replicated drives are much simpler than standard network SSD storage, which makes them perform several times faster.
- Added the ability to cancel a job.
Management console
CLI
- Added the build number in image version Yandex Data Processing.
- Added the ability to provide the
packages
,repositories
, andexclude_packages
parameters for Spark and PySpark jobs. By using these parameters, you can download additional dependencies and packages from external repositories.Management console
CLI