Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Data Processing
  • Getting started
    • Resource relationships
    • Runtime environment
    • Yandex Data Processing component interfaces and ports
    • Jobs in Yandex Data Processing
    • Spark jobs
    • Automatic scaling
    • Decommissioning subclusters and hosts
    • Networking in Yandex Data Processing
    • Maintenance
    • Quotas and limits
    • Storage in Yandex Data Processing
    • Component properties
    • Apache Iceberg™ in Yandex Data Processing
    • Delta Lake in Yandex Data Processing
    • Logs in Yandex Data Processing
    • Initialization scripts
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ
  1. Concepts
  2. Apache Iceberg™ in Yandex Data Processing

Apache Iceberg™ in Yandex Data Processing

Written by
Yandex Cloud
Updated at December 26, 2024

Apache Iceberg™ is an open table format for storing and processing large data arrays. It expands the feature set of the Apache Spark™ platform:

  • Supports the high-performance Apache Iceberg™ tables you use the same way as regular SQL tables.

  • Provides the schema evolution mechanism which eliminates side effects when updating schemas.

  • Provides hidden partitioning in auto mode thus preventing errors related to manual partitioning.

  • Allows retrospective requests enabled by the time travel mechanism. You can use the feature to make reproducible requests based on table snapshots or compare changes.

    Note

    This mechanism requires Apache Spark™ 3.3.x or higher.

  • Allows rolling tables back to previous versions (version rollback) for quick response to issues.

  • Provides advanced filtering that relies on column-level and partition-level statistics as well as table metadata. This accelerates request processing, even for very large tables: data files unrelated to the request will not be processed.

  • Enables the serializable isolation level — the strictest one for transaction isolation. All changes in tables are atomic, and the readers will see only the committed ones.

  • Supports concurrent writing based on the optimistic strategy: a writer will retry an operation if their changes are in conflict with those of another writer.

You can configure Apache Iceberg™ in a Yandex Data Processing cluster versions 2.0 or higher.

Note

Apache Iceberg™ is not part of Yandex Data Processing. It is not covered by Yandex Cloud support and its usage is not governed by the Yandex Data Processing Terms of Use.

For more information about Apache Iceberg™, see the official documentation.

Compatibility between Apache Iceberg™ versions and Yandex Data Processing imagesCompatibility between Apache Iceberg™ versions and Yandex Data Processing images

Apache Iceberg™ versions and Yandex Data Processing images are only compatible if the Apache Iceberg™ version is compatible with the Apache Spark™ version used in the cluster. The table below lists compatible versions and links to the library files you will need to configure Apache Iceberg™ in your cluster.

Yandex Data Processing image

Apache Spark™ version

Apache Iceberg™ version

JAR files

2.0.x

3.0.3

1.0.0

iceberg-spark-runtime-3.0_2.12-1.0.0.jar

2.1.x (2.1.0–2.1.3)

3.2.1

1.4.3

iceberg-spark-runtime-3.2_2.12-1.4.3.jar

2.1.x (2.1.4 and higher)

3.3.2

1.5.2

iceberg-spark-runtime-3.3_2.12-1.5.2.jar

2.2.x

3.5.0

1.5.2

iceberg-spark-runtime-3.5_2.12-1.5.2.jar

Note

Access to image 2.2 is provided on request. Contact support or your account manager.

Was the article helpful?

Previous
Component properties
Next
Delta Lake in Yandex Data Processing
Yandex project
© 2025 Yandex.Cloud LLC