Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Processing
  • Getting started
    • All guides
      • Connecting to component interfaces
      • Using Sqoop
      • Configuration Apache Iceberg™
        • Setting up Delta Lake in single-cluster mode
        • Setting up Delta Lake in multi-cluster mode
        • Tips for setting up and using Delta Lake
    • Setting up and using Python virtual environments
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

In this article:

  • Optimizing data writes to S3-compatible storage
  • Boosting OPTIMIZE operator performance
  • Syntax for converting partitioned tables
  • Forcing table change history cleanup
  1. Step-by-step guides
  2. Apache and other third-party services
  3. Delta Lake
  4. Tips for setting up and using Delta Lake

Tips for setting up and using Delta Lake

Written by
Yandex Cloud
Updated at January 23, 2025
  • Optimizing data writes to S3-compatible storage
  • Boosting OPTIMIZE operator performance
  • Syntax for converting partitioned tables
  • Forcing table change history cleanup

Optimizing data writes to S3-compatible storageOptimizing data writes to S3-compatible storage

If the format of some data within a job differs from that of Delta Lake tables, to optimize data writes to S3-compatible storage, configure S3A committers.

If all data within a job is stored in Delta Lake tables, there is no need to configure S3A committers. Delta Lake uses its own algorithm to control data writes to S3-compatible storage. Its functionality is equivalent to that of S3A committers.

Boosting OPTIMIZE operator performanceBoosting OPTIMIZE operator performance

The OPTIMIZE operator in Delta Lake 2.0.2 speeds up requests to read table data by merging multiple small files into larger ones. This merge is performed within several concurrent jobs. The maximum number of such concurrent jobs is controlled by the spark.databricks.delta.optimize.maxThreads property set to 10 by default.

To speed up the optimization procedure when handling large tables, increase the property value. You can use much larger values, e.g., 100 or 1000, if the cluster resources allow running this many concurrent operations.

Syntax for converting partitioned tablesSyntax for converting partitioned tables

The CONVERT TO DELTA operator converts standard Spark SQL tables to Delta Lake format. To convert a partitioned table, specify partitioning columns in the request:

CONVERT TO DELTA table_name PARTITIONED BY (part_col_1 INT, part_col_2 INT);

Forcing table change history cleanupForcing table change history cleanup

By default, Delta Lake stores the history of table changes for 30 days. This period is set at the table level in the delta.logRetentionDuration parameter; you can edit it using this command:

ALTER TABLE <table_schema_and_name> SET TBLPROPERTIES ('delta.logRetentionDuration' = "interval <interval>")

To learn more about managing the table parameters, see the Delta Lake documentation.

To force the table change history cleanup:

  1. Rearrange the table data to optimize the access:

    OPTIMIZE <table_name>;
    
  2. Allow deleting the entire history of changes:

    SET spark.databricks.delta.retentionDurationCheck.enabled = false;
    
  3. Clear the change history:

    VACUUM <table_name> RETAIN 0 HOURS;
    

Was the article helpful?

Previous
Setting up Delta Lake in multi-cluster mode
Next
All jobs
© 2025 Direct Cursus Technology L.L.C.