Hybrid storage in Managed Service for Greenplum®
Note
This feature is at the Preview stage and is free of charge.
Managed Service for Greenplum® clusters support hybrid storage: some data can be stored in cluster storage and some in cold storage. By default, data is stored in cluster storage on disks of the selected type. Data used rarely but meant to be stored for a long time can be moved from the cluster storage to a cold storage – to a Yandex Object Storage service bucket. This will make data storage less costly.
You can enable hybrid storage in Greenplum® clusters when creating or updating a cluster.
Warning
Once hybrid storage is enabled in a cluster, you cannot disable it.
Hybrid storage scope of use
You can use hybrid storage only for append-optimized tables. When you migrate data between cluster storage and cold storage, whole tables are migrated. Tables are processed using the Yezzey extension by Yandex Cloud.
The data of append-optimized tables is stored as a set of segment files, compressed and encrypted, in an Object Storage service bucket. The number of segment files depends on the number of segments in the cluster and on the table structure.
I/O management when using hybrid storage
When running SQL queries against append-optimized tables, there are many requests to segment files in storage. SQL query execution time depends on how efficiently you schedule I/O requests to segment files. If you do not use I/O request scheduling, storage performance is impaired, RAM consumption and SQL query execution time increase. Using the scheduler allows you to protect the cluster from performance degradation when executing SQL queries that involve processing massive amounts of data.
When data is stored in cluster storage, I/O request scheduling is performed by the operating system on the cluster hosts.
When data is stored in cold storage, the operating system cannot schedule I/O requests to the Object Storage service bucket. Therefore, to avoid performance degradation, Managed Service for Greenplum® clusters use YProxy by Yandex Cloud to schedule such requests. Even if the table resides in a cold storage, the use of YProxy minimizes the impact on SQL query execution time.
Learn more about hybrid storage architecture from this Habr article
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.