High availability of a Yandex Managed Service for Valkey™ cluster

Written by

Updated at June 11, 2026

Number and placement of cluster hosts
- For a non-sharded cluster
- For a sharded cluster
Replication and master failover settings
Connecting to a database
Storage settings
Persistence settings and the WAIT command
Virtual machine type
Other settings
Maintaining a cluster and modifying its parameters

High availability is the ability of a system to quickly recover functionality in the event of failure, ensuring continuous service operation for clients.

The high availability of a Yandex Managed Service for Valkey™ cluster depends on the SLA-related parameters and some other settings.

Number and placement of cluster hosts

For a non-sharded cluster

A single-host cluster does not provide high availability. If the master fails, your cluster becomes unavailable for reading and writing until the master is recovered.

If your cluster has at least two hosts, it remains available if one of them fails. A cluster is resilient to a failure of one availability zone if no zone contains more than half of all its hosts.

If each cluster shard has at least two hosts, such a cluster remains available if one of the hosts fails. A cluster is resilient to a failure of one availability zone if no zone contains more than half of the hosts belonging to a single shard.

Replication and master failover settings

High availability is achieved through replication and master failover, which work as follows:

rdsync, a host status management agent by Yandex, was integrated into the Yandex Managed Service for Valkey™ architecture; this agent automatically selects a new master and switches over to it in the event of a master failure. To ensure the optimal performance of rdsync, the number of hosts in the cluster must be even.
You can influence new master selection in a Yandex Managed Service for Valkey™ cluster by configuring priorities for cluster hosts.
You can manually select a new master and switch over to it.
If you use public access for the host, you must also enable it for the replicas, otherwise the cluster will become unavailable following master failover.
Yandex Managed Service for Valkey™ clusters use asynchronous replication, i.e., the result of a write request is committed to the master host, which then forwards the data to the cluster replicas.

Connecting to a database

Cluster availability depends on the connection method and settings:

Only use recommended clients for connection.
Configure security groups.
Set the values of the Timeout, Maxmemory policy, Maxmemory percent, Client output buffer limit normal, and Client output buffer limit pubsub Valkey™ settings so that under there are no write operation failures or mass connection interruptions normal operating conditions.

Storage settings

If the database storage is 100% full, the cluster will switch to read-only mode. To keep your cluster writable:

Enable automatic storage expansion.
Create an alert to monitor storage utilization.

Persistence settings and the WAIT command

To increase fault tolerance:

Enable persistence on replicas.
Use the WAIT command with N/2 available replicas, where N is the number of cluster hosts.

Virtual machine type

Cluster availability depends on the type of VMs you use to deploy your hosts. A highly available cluster should use a VM type with a 100% vCPU guarantee. The burstable VM type with a 50% vCPU guarantee does not ensure high availability and should only be used for test environments.

Other settings

The following cluster parameters and settings may also affect its availability:

Maintaining a cluster and modifying its parameters

The following operations may lead to interrupted connections, temporary performance degradation, or temporary cluster unavailability:

Starting maintenance operations (the start time is set by selecting a maintenance window) may cause interrupted connections and temporary write unavailability of the cluster.
Updating the Valkey™ version, changing the host class, changing the disk type, or increasing the storage size leads to interrupted connections, temporary performance degradation, or temporary write unavailability of the cluster.
Updating the Databases Valkey™ setting leads to interrupted connections and temporary write unavailability of the cluster.
Automatic and manual database backups may cause temporary performance degradation of the cluster.

Run these operations when the cluster load is minimal.

High availability of a Yandex Managed Service for Valkey™ cluster

Number and placement of cluster hostsNumber and placement of cluster hosts

For a non-sharded clusterFor a non-sharded cluster

For a sharded clusterFor a sharded cluster

Replication and master failover settingsReplication and master failover settings

Connecting to a databaseConnecting to a database

Storage settingsStorage settings

Persistence settings and the WAIT commandPersistence settings and the WAIT command

Virtual machine typeVirtual machine type

Other settingsOther settings

Maintaining a cluster and modifying its parametersMaintaining a cluster and modifying its parameters

Was the article helpful?