Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Transfer
  • Available transfers
  • Getting started
    • Resource relationships
    • Transfer types and lifecycles
    • What objects can be transferred
    • Regular incremental copy
    • Parallel copy
    • Data transformation
    • Serialization
    • Yandex Data Transfer specifics for sources and targets
    • Delivery guarantees
    • Operations on transfers
    • Networking in Yandex Data Transfer
    • Speed for copying data in Yandex Data Transfer
    • Change data capture
    • What tasks the service is used for
    • Quotas and limits
  • Troubleshooting
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  1. Concepts
  2. Regular incremental copy

Regular incremental copy

Written by
Yandex Cloud
Updated at July 9, 2024

This copying is run at certain time intervals. With this method, only the data that has been modified in the source since the previous copy gets into the target. This approach allows you to update the target's data with the minimum possible latency and load on the data source. However, it does not factor in data delete operations in the source. This is more efficient than copying entire tables but less efficient than using transfers of the Snapshot and increment type.

Note

You can configure incremental copy for PostgreSQL, ClickHouse®, and Airbyte® sources.

To help you find modified data, a table contains a key column whose values change each time data is added or updated. For the key column, we recommend building an index so that each invocation does not result in a full table scan. If data is only added to the table, the id column will do as the key column. If the existing table data is likely to change, we recommend adding the updated_at column as the key one with the date/time data type and write the current time to this column at each data update. The transfer remembers the maximum value of the key column and, when run next time, will only read the data added or updated since the last run.

Key columns can be columns of tables with numeric, character, or date/time data type. UUID will not suit as the data type in this case.

To make sure the results of regular incremental copy are correct, the source table must meet the following conditions:

  • The key column values must be strictly increasing and cannot be null.
  • The key column values must change when adding or updating data.
  • The data cannot be deleted. If this is required, save the data to be deleted to another table and deliver it separately.

Transaction completion timeoutTransaction completion timeout

A transaction race may occur while incremental copy is in progress. This is a situation when transactions are completed in a sequence different from the one they were run in. As a result, earlier changes may be displayed after those which were made later. In that case, the transfer calculates the key column value based on the more recent changes while ignoring the older ones.

The issue is fixed with a transaction completion timeout. This is a delay between calculating the key column value and obtaining the actual data. When set, the transfer is waiting for the specified number of seconds for all transactions that might change the data in the output range to complete and only after that reads new data from the table.

Make sure the transaction completion timeout is:

  • Longer than an average data insert or update transaction. In standard applications, it takes a few seconds to complete these transactions.
  • Much shorter than the interval between transfer runs, as it directly impacts the transfer duration.

The recommended and default value is 15 seconds.

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
What objects can be transferred
Next
Parallel copy
© 2025 Direct Cursus Technology L.L.C.