Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Data Processing
  • Getting started
    • Resource relationships
    • Runtime environment
    • Yandex Data Processing component interfaces and ports
    • Jobs in Yandex Data Processing
    • Spark jobs
    • Automatic scaling
    • Decommissioning subclusters and hosts
    • Networking in Yandex Data Processing
    • Maintenance
    • Quotas and limits
    • Storage in Yandex Data Processing
    • Component properties
    • Apache Iceberg™ in Yandex Data Processing
    • Delta Lake in Yandex Data Processing
    • Logs in Yandex Data Processing
    • Initialization scripts
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

In this article:

  • Delta Lake and Yandex Data Processing version compatibility
  • Delta Lake 2.x key advantages
  1. Concepts
  2. Delta Lake in Yandex Data Processing

Delta Lake in Yandex Data Processing

Written by
Yandex Cloud
Updated at January 23, 2025
  • Delta Lake and Yandex Data Processing version compatibility
  • Delta Lake 2.x key advantages

Delta Lake is open-source software that expands Apache Spark™ functionality:

  • Adds an optimized storage layer for table data with ACID transaction support.
  • Enables scalable processing of metadata.
  • Allows updating data in analytical tables stored as Parquet files in HDFS or S3-compatible storage.
  • Allows processing batch requests and running data streaming operations.

You can set up Delta Lake in Yandex Data Processing clusters:

  • In single-cluster mode for Yandex Data Processing 2.0 and 2.1
  • In multi-cluster mode for Yandex Data Processing 2.1 and higher

Although single-cluster mode allows using tables from different clusters and Apache Spark™ jobs, concurrent data writes from a variety of sources may lead to table data loss. To avoid this, you need additional setup of data writes.

In multi-cluster mode, access to Delta Lake tables from different clusters and Apache Spark™ jobs is managed by an auxiliary database. In Yandex Cloud, this role is performed by Yandex Managed Service for YDB.

Note

Delta Lake is not part of Yandex Data Processing. It is not covered by Yandex Cloud support, and its usage is not governed by the Yandex Data Processing Terms of Use.

For more information about Delta Lake, see the Delta Lake documentation.

Delta Lake and Yandex Data Processing version compatibilityDelta Lake and Yandex Data Processing version compatibility

Delta Lake and Yandex Data Processing versions are only compatible if the Delta Lake version is compatible with the Apache Spark™ version used in the cluster. The table below lists compatible versions and links to library files that you will need to set up Delta Lake in your cluster.

Yandex Data Processing version

Apache Spark™ version

Delta Lake version

JAR files

2.0.x

3.0.3

0.8.0

delta-core_2.12-0.8.0.jar

2.1.0 and 2.1.3

3.2.1

2.0.2

delta-core_2.12-2.0.2.jar,
delta-storage-2.0.2.jar

2.1.4 and higher

3.3.2

2.3.0

delta-core_2.12-2.3.0.jar,
delta-storage-2.3.0.jar

Note

Yandex Data Processing 2.1.x clusters are at the Preview stage and provided upon request. Contact support or your account manager.

Delta Lake 2.x key advantagesDelta Lake 2.x key advantages

Here are the key advantages of Delta Lake 2.x as compared to 0.8.0:

  • Support for multi-cluster mode provides automated orchestration of changes to data in a single table from different Apache Spark™ jobs and Yandex Data Processing clusters.
  • The idempotent data write feature allows maintaining exactly-once processing of data streams.
  • The Change Data Feed feature allows tracking changes to data in Delta Lake tables.
  • The Z-Ordering feature implements multidimensional clustering of Delta Lake tables. It speeds up running requests with restrictions on columns used for clustering.
  • Support for dynamic partition overwrites.
  • Request performance optimization by merging small files into larger ones.
  • Support for table rollbacks to the previous state.

Was the article helpful?

Previous
Apache Iceberg™ in Yandex Data Processing
Next
Logs in Yandex Data Processing
Yandex project
© 2025 Yandex.Cloud LLC