Migrating data to Managed Service for Sharded PostgreSQL
Can I shard by a composite key?
Yes. Do it by creating a composite key:
CREATE DISTRIBUTION <sharding_rule_name> COLUMN TYPES integer, varchar;
ALTER DISTRIBUTION <sharding_rule_name> ATTACH RELATION orders DISTRIBUTION KEY user_id, order_date;
Learn more about composite sharding keys in this SPQR guide
Can I create a default shard for unbound keys?
Yes. Do it by running this command:
ALTER DISTRIBUTION <sharding_rule_name> ADD DEFAULT SHARD <shard_name>;
Learn more about default shards in this SPQR guide
How do I add a new shard and rebalance the data?
- Create a new shard.
- Use the
SYNC REFERENCE TABLEScommand to copy reference tables. - Redistribute the key ranges. Use these commands:
SPLIT KEY RANGEto split the key range.REDISTRIBUTE KEY RANGEto automatically migrate data.
How does data rebalancing work when adding a new shard?
When adding a new shard, you need to start manual data migration using the REDISTRIBUTE KEY RANGE command. In which case Sharded PostgreSQL moves small data ranges to minimize the cluster being unavailable for writes.
How do I load big data volumes?
There are two options for loading big data volumes:
-
COPYwith the/* __spqr__allow_multishard: true */virtual parameter.For example, for loading from a CSV file:
COPY <table_name> FROM 'data.csv' WITH DELIMITER ',' /* __spqr__allow_multishard: true */;Warning
The use of
COPYcan cause a high load on the router. If you are usingCOPYon a regular basis, we recommend allocating a separate router for the task. -
Batch insert of several table rows using the
/* __spqr__engine_v2: true */virtual parameter. In this scenario, the router analyzes each row, determines the target shard based on the sharding key, and converts the query into separateINSERTcommands for each shard.For example, the following command:
INSERT INTO users (id, name) VALUES (1, 'Alice'), -- send to shard sh1 (100, 'Bob'), -- send to shard sh2 (2, 'Charlie') -- send to shard sh1 /* __spqr__engine_v2: true */; -- NOTICE: send query to shard(s) : sh1,sh2will be converted to:
INSERT INTO users (id, name) VALUES (1, 'Alice'), (2, 'Charlie');for shardsh1.INSERT INTO users (id, name) VALUES (100, 'Bob');for shardsh2.
You can set virtual parameters with comments in SQL or via SET.
Learn more about inserting big data volumes in this SPQR guide
How can I speed up migration of big data volumes?
- Increase chunk size.
- Make sure each shard has an index based on the sharding key.
- Avoid running parallel writes during migration.
What happens if there is a failure during data migration?
Sharded PostgreSQL ensures range-level atomicity. If there is a failure, data can temporarily exist on both shards. The operation will be either canceled or resumed following recovery.
How do I rename a table while keeping it available?
Run the ALTER TABLE sequence within a single transaction. To enable this feature, use the /* __spqr__multishard_ddl: true */ virtual parameter.
ALTER TABLE ... /* __spqr__multishard_ddl: true */;
The operation is non-transactional. Be careful when renaming tables.
You can set virtual parameters with comments in SQL or via SET.