Sharding keys
Sharding is a horizontal scaling method for distributed databases that splits the database into multiple parts called shards. In Managed Service for Sharded PostgreSQL, each shard is a separate Managed Service for PostgreSQL cluster that stores a portion of the system data.
You can find more information about sharding in Managed Service for Sharded PostgreSQL here.
The sharding key is one or more table columns the system uses to determine which shard will store a row. A key that includes more than one column is called a composite key.
Key columns may have the following data types:
intandbigintvarcharUUIDCITYandMURMUR(for integers only)
Sharding strategies
Row distribution across shards follows the strategy you choose:
-
Range strategy assigns rows to shards based on value ranges. A row goes to a shard if its key value falls within that shard’s range. Rows with close key values be placed on the same shard.
For example, one shard will store rows with key values from
1to1000, and another, from1001to2000, etc.The drawback of this strategy is that data may be distributed unevenly.
Learn more about the range strategy in Sharded PostgreSQL here
. -
Hash strategy uses a hash function applied to the key value to select a shard. The resulting hash value determines the shard the row will go to. Compared to the range strategy, this approach distributes data more evenly across shards. However, range queries become more complex because rows with similar values may reside on different shards.
Learn more about the hash strategy in Sharded PostgreSQL here
.
Composite sharding keys
A composite key uses multiple columns, and their order matters during sharding.
Composite keys are used for sharding data whose uniqueness depends on a combination of multiple columns. For example, such data may include:
- Time series
- Geographic data stored across several related columns
When using a composite key, the system determines the target shard for a row as follows:
-
For the range strategy, it lists all combinations of column values.
Example of configuring sharding rules for the range strategy:
CREATE DISTRIBUTION ds1 COLUMN TYPES integer, integer; CREATE KEY RANGE FROM 100,100 ROUTE TO sh4 FOR DISTRIBUTION ds1; CREATE KEY RANGE FROM 100,0 ROUTE TO sh3 FOR DISTRIBUTION ds1; CREATE KEY RANGE FROM 0,100 ROUTE TO sh2 FOR DISTRIBUTION ds1; CREATE KEY RANGE FROM 0,0 ROUTE TO sh1 FOR DISTRIBUTION ds1; -
For the hash strategy, it applies the hash function to each key column individually, then adds together the resulting numbers, and applies the hash function to the total.
Example of configuring sharding rules for the hash strategy:
CREATE RELATION tr(MURMUR [id1 INT HASH, id2 VARCHAR HASH]); CREATE KEY RANGE FROM 3221225472 ROUTE TO sh4; CREATE KEY RANGE FROM 2147483648 ROUTE TO sh3; CREATE KEY RANGE FROM 1073741824 ROUTE TO sh2; CREATE KEY RANGE FROM 0 ROUTE TO sh1;Using the hash strategy makes it easier to configure sharding rules, particularly for composite keys consisting of three or more columns.
Learn more about composite sharding keys in Sharded PostgreSQL here
Best practices for using composite keys
- List the column most frequently used in queries first. If using the range strategy, the first column should be the one that is used in queries most frequently and has as few unique values as possible.
- Make sure the
WHEREconditions cover all composite key columns. Queries missing any column will be blocked if query_routing.default_route_behaviour is set toBLOCK.