Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Processing
  • Getting started
    • All guides
      • Connecting to component interfaces
      • Using Sqoop
      • Apache Iceberg™ configuration
        • Configuring Delta Lake in single-cluster mode
        • Configuring Delta Lake in multi-cluster mode
        • Tips for configuring and using Delta Lake
    • Creating and using Python virtual environments
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

In this article:

  • Set up your infrastructure
  • Set up the component properties to work with Delta Lake
  • Delta Lake use case
  1. Step-by-step guides
  2. Apache and other third-party services
  3. Delta Lake
  4. Configuring Delta Lake in single-cluster mode

Setting up Delta Lake in single-cluster mode

Written by
Yandex Cloud
Updated at September 25, 2025
  • Set up your infrastructure
  • Set up the component properties to work with Delta Lake
  • Delta Lake use case

Yandex Data Processing 2.0 or higher supports using Delta Lake in single-cluster mode.

For more information about Delta Lake, see the Delta Lake in Yandex Data Processing section of the Delta Lake documentation.

Note

Delta Lake is not part of Yandex Data Processing. It is not covered by Yandex Cloud support, and its usage is not governed by the Yandex Data Processing Terms of Use.

Warning

If different Spark jobs in single-cluster mode are concurrently updating table data, this may cause data loss.

Set up the Spark jobs to avoid concurrent data modifications or use multi-cluster mode. For more information, see this Delta Lake article.

Set up your infrastructureSet up your infrastructure

  1. If you do not have a Yandex Data Processing cluster, create one.

  2. If you attached a Yandex Object Storage bucket to your cluster:

    1. Create a folder named warehouse in the bucket.
    2. Set spark.sql.warehouse.dir to s3a://<bucket_name>/warehouse/.
  3. Create a Apache Hive™ Metastore cluster and connect it to your Yandex Data Processing cluster.

Set up the component properties to work with Delta LakeSet up the component properties to work with Delta Lake

  1. Set the following properties at the cluster or individual job level:

    • Set spark.sql.extensions to io.delta.sql.DeltaSparkSessionExtension.
    • Set spark.sql.catalog.spark_catalog to org.apache.spark.sql.delta.catalog.DeltaCatalog.
  2. Add the Delta Lake libraries to the dependencies of your cluster or individual job (the required library versions depend on the Yandex Data Processing version):

    Yandex Data Processing 2.0.x
    Yandex Data Processing 2.1.0 or 2.1.3
    Yandex Data Processing 2.1.4 and higher

    Use one of these methods:

    • Download the delta-core_2.12-0.8.0.jar library file, save it to your Object Storage bucket, and provide the file URL in the spark.jars property:

      spark.jars=s3a://<bucket_name>/<file_path>

      Make sure the cluster service account has read access to the bucket.

    • Set up cluster access to the Maven repository and set the spark.jars.packages property to io.delta:delta-core_2.12:0.8.0.

      You can set up Maven access in two ways:

      • In your cluster's security group, allow network access to the Maven Central repository.
      • Configure an alternative Maven repository and allow traffic to it in the cluster security group.
    • Download the delta-core_2.12-0.8.0.jar library file, copy it to all cluster nodes manually or using initialization scripts, and provide the full file path in the spark.driver.extraClassPath and spark.executor.extraClassPath properties.

    Use one of these methods:

    • Download the delta-core_2.12-2.0.2.jar and delta-storage-2.0.2.jar library files, save them to your Object Storage bucket, and provide the comma-separated file URLs in the spark.jars property:

      spark.jars=s3a://<bucket_name>/<path_to_core_file>,s3a://<bucket_name>/<path_to_storage_file>

      Make sure the cluster service account has read access to the bucket.

    • Set up cluster access to the Maven repository and set the spark.jars.packages property to io.delta:delta-core_2.12:2.0.2,io.delta:delta-storage:2.0.2.

      You can set up Maven access in two ways:

      • In your cluster's security group, allow network access to the Maven Central repository.
      • Configure an alternative Maven repository and allow traffic to it in the cluster security group.
    • Download the delta-core_2.12-2.0.2.jar and delta-storage-2.0.2.jar library files, copy them to all cluster nodes manually or using initialization scripts, and provide the full file path in the spark.driver.extraClassPath and spark.executor.extraClassPath properties.

    Use one of these methods:

    • Download the delta-core_2.12-2.3.0.jar and delta-storage-2.3.0.jar library files, save them to your Object Storage bucket, and provide the comma-separated file URLs in the spark.jars property:

      spark.jars=s3a://<bucket_name>/<path_to_core_file>,s3a://<bucket_name>/<path_to_storage_file>

      Make sure the cluster service account has read access to the bucket.

    • Set up cluster access to the Maven repository and set the spark.jars.packages property to io.delta:delta-core_2.12:2.3.0,io.delta:delta-storage:2.3.0.

      You can set up Maven access in two ways:

      • In your cluster's security group, allow network access to the Maven Central repository.
      • Configure an alternative Maven repository and allow traffic to it in the cluster security group.
    • Download the delta-core_2.12-2.3.0.jar and delta-storage-2.3.0.jar library files, copy them to all cluster nodes manually or using initialization scripts, and provide the full file path in the spark.driver.extraClassPath and spark.executor.extraClassPath properties.

You can now use Delta Lake in your Yandex Data Processing cluster.

If you set the above Spark properties at the cluster level, you can use Spark Thrift Server to work with Delta Lake tables.

Delta Lake use caseDelta Lake use case

This use case was tested on a Yandex Data Processing cluster of version 2.0 with access to the Maven Central repository.

  1. Use SSH to connect to the Yandex Data Processing cluster's master host.

  2. Run a Spark session in the cluster by providing the required parameters:

    spark-sql \
        --conf spark.jars.packages=io.delta:delta-core_2.12:0.8.0 \
        --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
        --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
    
  3. In your active session, create a database and switch to it:

    CREATE DATABASE testdelta;
    USE testdelta;
    
  4. Create a test table and populate it with data:

    CREATE TABLE tab1(a INTEGER NOT NULL, b VARCHAR(100)) USING DELTA;
    INSERT INTO tab1 VALUES (1,'One'), (2,'Two'), (3,'Three');
    
  5. Replace the b column values by adding to them the a column values converted to a string:

    UPDATE tab1 SET b=b || ' ** ' || CAST(a AS VARCHAR(10));
    
  6. Check the result:

    SELECT * FROM tab1;
    
    3	Three ** 3
    2	Two ** 2
    1	One ** 1
    

Was the article helpful?

Previous
Apache Iceberg™ configuration
Next
Configuring Delta Lake in multi-cluster mode
© 2025 Direct Cursus Technology L.L.C.