Using Yezzey in Yandex MPP Analytics for PostgreSQL
The Yezzey extension allows you to use hybrid storage in a Yandex MPP Analytics for PostgreSQL cluster for storing append-optimized tables.
You can export rarely used tables of this type to cold storage and continue working with them as with ordinary tables. You can upload tables back to the cluster storage if you need to.
For more information about Yezzey and storage types, see Hybrid storage.
Installing Yezzey in a Greenplum® cluster
-
Enable the hybrid storage option if you did not when creating the cluster.
Warning
You cannot disable this option after you save your cluster settings.
-
Connect to the database as the owner or a user with the
CREATEpermission and send this request:CREATE EXTENSION yezzey; -
Make sure the extension has been installed:
SELECT extname FROM pg_extension;
Getting a table's location info
To learn where a table or its segment files are placed (in cluster storage or cold storage):
-
Connect to the database as the owner or a user with the
SELECTpermission for the database. -
Send one of the following requests to find out where the table is:
-
Stating the table name:
SELECT * FROM yezzey_offload_relation_status('<table_name>'); -
Stating the schema the table is in and the table name:
SELECT * FROM yezzey_offload_relation_status('<schema_name>', '<table_name>');
You will get information about each segment of the Greenplum® cluster the table's data resides in.
The query result contains the following fields:
Field Description offload_reloidOID. segindexShard ID. -1indicates the master.local_bytesSize of data in the cluster storage. If 0, the table is offloaded.external_bytesSize of data offloaded to cold storage. If all column values are zero, the table is in the cluster storage. If the column contains non-zero values, the table is in the cold storage. external_bloat_bytesSize of data offloaded to cold storage that is no longer in use but not yet deleted. -
-
Send one of the following requests to learn where the table's segment files are:
-
Stating the table name:
SELECT * FROM yezzey_offload_relation_status_per_filesegment('<table_name>'); -
Stating the schema the table is in and the table name:
SELECT * FROM yezzey_offload_relation_status_per_filesegment('<schema_name>', '<table_name>');
You will get information about each of the table's segment files. Review the
external_bytescolumn:- If all column values are null, the files are in cluster storage.
- If the column contains values other than null, the files are in cold storage.
-
Figuring out how a table is placed in cold storage
If the table is in cold storage, its segment files are stored in a service bucket.
To get info on how the table's segment files are placed in the bucket:
-
Connect to the database as the owner or a user with the
SELECTpermission for the database. -
Send one of the following requests:
-
Stating the table name:
SELECT * FROM yezzey_relation_describe_external_storage_structure('<table_name>'); -
Stating the schema the table is in and the table name:
SELECT * FROM yezzey_relation_describe_external_storage_structure('<schema_name>', '<table_name>');
The following information will be displayed about each of the table's segment files:
-
offload_reloid: Object ID (object identifier, OID) for the table. -
segindex: Segment number. -
segfileindex: Segment file number. -
external_storage_filepath: Path to the segment file in the Object Storage bucket.The path to the segment file is generated automatically. It depends on the table structure and the number of segments in the Greenplum® cluster.
-
local_bytesandlocal_commited_bytes: Number of segment file bytes in the cluster storage. The values must be null. -
external_bytes: Segment file size.
-
Offloading a table from cluster storage to cold storage
If the table is in cluster storage, you can offload it to cold storage. To figure out where the table is, get its location info.
Note
You cannot edit the table while offloading it.
After offloading is completed and the exclusive lock is removed, you will once again be able to edit the table.
To offload a table:
-
Connect to the database as the owner or a user with the
SELECTpermission for the database. -
Invoke one of the following functions:
-
Stating the table name:
SELECT yezzey_define_offload_policy('<table_name>'); -
Stating the schema the table is in and the table name:
SELECT yezzey_define_offload_policy('<schema_name>', '<table_name>');
-
The offloading time depends on the table size and the number of segment files. During the procedure, you will be getting messages that will allow you to track its progress.
-
Messages like this one show the size of the segment files offloaded earlier:
NOTICE: yezzey: relation virtual size calculated: 0 (segX sliceY ... pid=...)0means that no segment files were offloaded earlier (this is the first export). -
Messages like this one indicate that the segment files were successfully offloaded:
INFO: yezzey: relation segment reached external storage (blkno=...), up to logical eof ... (segX sliceY ... pid=...)
Loading a table from a cold storage to the cluster storage
If the table is in cold storage, you can load it to the cluster storage. To figure out where the table is, get its location info.
Note
You cannot edit the table while loading it.
After loading is completed and the exclusive lock is removed, you will once again be able to edit the table.
To load a table:
-
Connect to the database as the owner or a user with the
SELECTpermission for the database. -
Invoke one of the following functions:
-
Stating the table name:
SELECT yezzey_load_relation('<table_name>'); -
Stating the schema the table is in and the table name:
SELECT yezzey_load_relation('<schema_name>', '<table_name>');
-
The loading time depends on the table size and the number of segment files. After the procedure is completed, you will get a message in the following format:
INFO: loaded relation ... to local storage
Usage example
For an example of using the extension to work with hybrid storage, see Exporting Greenplum® data to a cold storage in Yandex Object Storage.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.