Replication and fault tolerance
Yandex Managed Service for Valkey™ uses native Valkey™ replication and provides high availability of cluster data using rdsync
Replication
Yandex Managed Service for Valkey™ clusters use asynchronous replication, i.e., the result of a write request is committed to the master host, which then forwards the data to the cluster replicas. The replication process does not affect the master availability in any way, but it can make replicas temporarily unavailable (for up to a few seconds for large databases) when loading new data into memory.
Since the replication is asynchronous, the data on replicas may be out of date: while a replica is processing updates from the master, it continues sending the existing data in response to requests, as the replica-serve-stale-datayes
.
Due to limited resources, b1, b2, and b3 class hosts are not replicated.
For more information about how replication works in Valkey™, read the relevant documentation
Fault tolerance
To ensure fault tolerance, rdsync
Host status is stored in the distributed configuration management system. If the connection to the DCS (distributed configuration store, e.g., ZooKeeper, etcd, or Consul) is lost, the agent switches the host to protected mode
Thanks to the rdsync
agent in Yandex Managed Service for Valkey™ cluster:
-
Configurations that consist of an even number of hosts (for non-sharded clusters) or one or two shards (for sharded clusters) are fault-tolerant.
-
The client request
for the name of the host available for writes is processed in alignment with therdsync
agent and provides up-to-date information to clients, because the statuses of all hosts are known. -
The risk of losing data can be reduced by using the
WAIT
command withN/2
available replicas, whereN
is the number of cluster hosts.
Sharded clusters with the local-ssd disk type and only one host per shard are not considered fault-tolerant. You cannot create such a cluster.
Assigning a different host as a master if the primary master fails
If the master host fails, a host with the least lag behind the master will become a new master.
You can influence master selection in a Valkey™ cluster by configuring priorities for cluster hosts. The host with the highest priority will become a new master. If the replica host with the highest priority requires full data resync, the priority value will be ignored and a host with the least lag behind the master will become a new master.
You can set host priority:
- When creating a cluster or a host in a cluster.
- When changing the host settings.
Minimum value (lowest priority): 0
. A host with such priority value can become a master only if there are no other hosts suitable for the role. Default priority value: 100
. You can specify a value higher than 100
.
A master host can be changed either automatically, as a result of a failure, or manually. Manual master switching is available both for sharded and unsharded clusters.
Persistence
Yandex Managed Service for Valkey™ clusters use data persistence presets. You can disable persistence if needed to improve server throughput, as the DBMS will stop writing updates to disk.
Warning
Disabling persistence is only fine in case data integrity is not important for your application, e.g., when using Yandex Managed Service for Valkey™ as cache. This is because, in this case, the most recent data captured in Valkey™ will only be stored in RAM and may be lost if a server crashes.
Persistence settings
By default, cluster persistence is enabled and uses the following Valkey™ settings:
-
save ""
Regular RDB file saving is disabled. AOF mode is used instead.
-
appendonly yes
AOF (Append Only File) mode is enabled. In this mode, Valkey™ logs every write operation without changing already written data.
-
no-appendfsync-on-rewrite yes
The AOF
fsync
policy being set toeverysec
, the AOF log's background save (BGSAVE
) or rewrite (BGREWRITEAOF
) process performs many disk I/O operations. Valkey™ may block callingfsync()
for too long in some Linux configurations.The setting prevents calling
fsync()
within the main system process when runningBGSAVE
orBGREWRITEAOF
.When you run
BGREWRITEAOF
,fsync()
is in progress. Valkey™ writes the shortest sequence of commands needed to rebuild the current dataset in memory. You can manage data size using the aof-rewrite-incremental-fsync setting. -
auto-aof-rewrite-percentage 100
The AOF log size must exceed 100% for the AOF file to be automatically rewritten. This setting depends on the log file's auto-aof-rewrite-min-size setting.
-
auto-aof-rewrite-min-size 64mb
Auto-aof-rewrite-min-size 64mb
-
aof-load-truncated yes
Allows loading a truncated AOF file if the system crashes. The log notifies the user on loading the truncated file.
-
aof-rewrite-incremental-fsync yes
Enables AOF file synchronization every time 32 MB of data is generated.
-
aof-use-rdb-preamble yes
Enables using the RDB file as a prefix to the AOF file when rewriting or restoring it.
For more information on Valkey™ persistence mechanisms, refer to the DBMS documentation
Disabling persistence
With persistence disabled, the following Valkey™ settings take effect:
-
save ""
Regular RDB file saving is disabled.
-
appendonly no
AOF (Append Only File) mode is disabled.