Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Data Processing
  • Getting started
    • All guides
      • Connecting to component interfaces
      • Using Sqoop
      • Apache Iceberg™ configuration
        • Configuring Delta Lake in single-cluster mode
        • Configuring Delta Lake in multi-cluster mode
        • Tips for configuring and using Delta Lake
    • Creating and using Python virtual environments
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • FAQ

In this article:

  • Optimizing data writes to S3-compatible storage
  • Boosting the OPTIMIZE operator efficiency
  • Syntax for converting partitioned tables
  • Forced cleanup of table change history
  1. Step-by-step guides
  2. Apache and other third-party services
  3. Delta Lake
  4. Tips for configuring and using Delta Lake

Tips for setting up and using Delta Lake

Written by
Yandex Cloud
Updated at June 29, 2026
  • Optimizing data writes to S3-compatible storage
  • Boosting the OPTIMIZE operator efficiency
  • Syntax for converting partitioned tables
  • Forced cleanup of table change history

Optimizing data writes to S3-compatible storageOptimizing data writes to S3-compatible storage

If part of the job data uses formats other than Delta Lake tables, configure S3A committers to optimize data writes to S3-compatible storage.

If all job data resides in Delta Lake tables, you do not need to configure S3A committers. Delta Lake uses its own algorithm to manage data writes to S3-compatible storage. It is functionally equivalent to S3A committers.

Boosting the OPTIMIZE operator efficiencyBoosting the OPTIMIZE operator efficiency

The OPTIMIZE operator in Delta Lake 2.0.2 improves table data read query performance by merging multiple small files into larger ones. This merging runs as multiple concurrent jobs. You can set the maximum number of such concurrent jobs using the spark.databricks.delta.optimize.maxThreads property. By default, it is 10.

To speed up the optimization when handling large tables, increase the spark.databricks.delta.optimize.maxThreads property value. You can use much higher values, e.g., 100 or 1000, if the cluster resources allow running that many concurrent operations.

Syntax for converting partitioned tablesSyntax for converting partitioned tables

The CONVERT TO DELTA operator converts standard Spark SQL tables to Delta Lake format. To convert a partitioned table, specify the partitioning columns in the query:

CONVERT TO DELTA table_name PARTITIONED BY (part_col_1 INT, part_col_2 INT);

Forced cleanup of table change historyForced cleanup of table change history

By default, Delta Lake stores the history of table changes for 30 days. This period is set at the table level in the delta.logRetentionDuration parameter. You can edit it using this command:

ALTER TABLE <table_schema_and_name> SET TBLPROPERTIES ('delta.logRetentionDuration' = "interval <interval>")

For more on managing table properties, see this Delta Lake article.

To forcibly clean up the table change history:

  1. Rearrange the table data to optimize access performance:

    OPTIMIZE <table_name>;
    
  2. Allow deleting the entire history of changes:

    SET spark.databricks.delta.retentionDurationCheck.enabled = false;
    
  3. Clear the change history:

    VACUUM <table_name> RETAIN 0 HOURS;
    

Was the article helpful?

Previous
Configuring Delta Lake in multi-cluster mode
Next
All jobs
© 2026 Direct Cursus Technology L.L.C.