Data delivery guarantees
There are three data delivery strategies:
- At-most-once: Producer sends a message only once. If the consumer fails to receive the message, it will be irrevocably lost. The rate of data transmission is higher, but its delivery is not guaranteed.
- At-least-once: Producer keeps sending a message until its receipt is confirmed by the consumer. This strategy provides a full delivery guarantee, but may lead to message duplicates on the consumer side.
- Exactly-once: Producer keeps sending a message until its receipt is confirmed by the consumer. Once the message is delivered, the consumer handles it in a manner to avoid duplicate messages. This strategy provides a full delivery guarantee with no duplicate messages, but it requires more computing resources and is harder to implement.
For all consumer-producer pairs, Data Transfer supports the At-least-once data delivery strategy. The consumer writes all messages received from the producer to a database and sends a write confirmation to the producer. If, for some reason, the producer fails to receive the write confirmation from the consumer, it will resend the message to it. This may result in duplicate data in the consumer database.
In this case, the Exactly-once strategy is implemented for DMBS-level data if the following two requirements are met:
-
The table being delivered has a primary key.
-
The consumer database deduplicates data by primary key:
Target Deduplication by primary key Apache Kafka® topic: Your own or as part of the Managed Service for Apache Kafka® service ClickHouse® database: Your own or as part of the Managed Service for ClickHouse® service Your own Elasticsearch database Greenplum® database: Your own or as part of the Managed Service for Greenplum® service MongoDB database: Your own or as part of the Managed Service for MongoDB service MySQL® database: Your own or as part of the Managed Service for MySQL® service PostgreSQL database: Your own or as part of the Managed Service for PostgreSQL service OpenSearch database: Your own or as part of the Managed Service for OpenSearch service Managed Service for YDB database: As part of the Managed Service for YDB service Yandex Object Storage bucket Yandex Data Streams data stream
Tip
To perform background deduplication in the ClickHouse® consumer database, you can use the ReplacingMergeTree engine
ClickHouse® is a registered trademark of ClickHouse, Inc