Sharding in Managed Service for Sharded PostgreSQL
Sharding is a horizontal scaling strategy where data from multiple PostgreSQL clusters is combined into a single Sharded PostgreSQL cluster. The PostgreSQL clusters reside in the same folder and cloud network as the Sharded PostgreSQL cluster. In the Sharded PostgreSQL cluster, data is distributed across shards using a sharding key. Number of shards per cluster is unlimited.
The Sharded PostgreSQL cluster contains:
- Hosts that manage sharding in the Sharded PostgreSQL cluster.
- Hosts of PostgreSQL clusters that act as data hosts (master and replicas).
Managed Service for Sharded PostgreSQL supports the following data sharding strategies:
For more information about data sharding, see the SPQR documentation
Advantages of sharding
Sharding is often used in the following cases:
- When you expect very frequent PostgreSQL database queries and rapid data growth.
- When your application requires more resources but increasing the computing power of the PostgreSQL cluster hosts (disks, RAM, and CPUs) is no longer an option.
With sharding, you can:
-
Overcome technical limitations.
If you need to work with large datasets, your data storage infrastructure may reach the maximum capacity of commercially available hardware, e.g., the disk subsystem will indicate low IOPS. If the application is running at hardware performance limits, it may be a good idea to distribute data across shards. In this case, the read operations will be performed concurrently.
-
Improve fault tolerance.
Sharding allows you to isolate individual host or replica failures. If you do not use sharding, then, when one host fails, you lose access to the entire dataset it contains. Conversely, if one shard out of five fails, 80% of the table data will still be available.
-
Improve query performance.
Query performance may degrade due to resource contention. This usually happens as the number of read operations or CPU time per query increases. Shards handle queries to the same table in parallel, thus avoiding resource (CPU and disk subsystem) contention and reducing query processing time.
Sharding management in Managed Service for Sharded PostgreSQL
In Managed Service for Sharded PostgreSQL, sharding is managed by the hosts with the ROUTER (routing user queriesCOORDINATOR (storing shard configuration
You can use two types of sharding in Managed Service for Sharded PostgreSQL:
-
Standard: Cost-effective sharding for clusters that do not have any special requirements for sharding management hosts. The cluster will be expanded to include the
INFRAhosts having both theROUTERandCOORDINATORroles. -
Advanced: Flexible type of shading.
ROUTERhosts and optionalCOORDINATORhosts will be added to the cluster.
COORDINATOR or INFRA hosts are used to configure sharding.
All queries in a Managed Service for Sharded PostgreSQL cluster must be directed to ROUTER or INFRA hosts. You cannot fully remove these hosts from the cluster. At least one ROUTER or INFRA host must remain.