Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Managed Service for Greenplum®
  • Getting started
    • Resource relationships
    • Host classes
    • Calculating the cluster configuration
    • Networking in Managed Service for Greenplum®
    • Quotas and limits
    • Backups
    • Resource groups
    • Sharding
    • Users and roles
    • User authentication
    • Command center
    • External tables
    • Managing connections
    • Expanding a cluster
    • Maintenance
    • Greenplum® settings
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • Preparation stage
  • Data redistribution stage
  • Stage duration and duration control
  • Settings affecting data redistribution process
  1. Concepts
  2. Expanding a cluster

Expanding a Greenplum® cluster

Written by
Yandex Cloud
Updated at May 5, 2025
  • Preparation stage
  • Data redistribution stage
  • Stage duration and duration control
  • Settings affecting data redistribution process

You can expand a cluster to add additional segment hosts to it.

The expansion procedure consists of a preparation stage and a data redistribution stage. The data redistribution stage can be completed either right after the preparation stage or later on in the background.

Each of these stages may take a long time. You cannot influence the duration of the preparation stage, but you can influence that of the data redistribution stage, thus controlling the overall duration of the cluster expansion procedure.

Preparation stagePreparation stage

At this stage, the following processes take place:

  1. New segment hosts are added to the cluster.

  2. The gpexpand utility gears up for for table redistribution:

    1. Creates the gpexpand service data schema in the postgres database.

    2. Generates a table redistribution queue.

      All tables from all the cluster databases will be redistributed but they all will get different priorities affecting their positions in the queue.

      You can manage their priorities provided that data redistribution for a particular table has not started yet and the cluster is not closed from load.

    3. Prepares partitioned tables for data redistribution.

The approximate duration of this stage is several hours; there is no way to influence it. For more information on how long the stages take, see below.

Warning

Technically, new segment hosts will be added to the cluster already at this stage, but the expansion will be considered complete only after the data redistribution stage is over.

Data redistribution stageData redistribution stage

At this stage, the following processes take place:

  1. The cluster's table data is redistributed using the gpexpand utility for even distribution across all segment hosts.

  2. The gpexpand service data schema is deleted.

The approximate duration of this stage is several days. You can influence it using settings. For more information on stage durations, see below.

Stage duration and duration controlStage duration and duration control

The approximate durations of the stages:

  • Several hours for the preparation stage.
  • Several days for the data redistribution stage.

The actual duration of each stage depends not only on the size of the cluster databases and the total number of tables but also on the level and nature of the cluster load.

This is because the gpexpand utility, which operates at every stage of cluster expansion, captures exclusive locks at the individual table level. User requests may also capture locks when they are executed. This may considerably slow down both gpexpand and user request processing: it depends on which process captures the lock first and which one has to wait for the lock to be released. Both of these processes can generate increased load on the cluster.

You cannot shorten the preparation stage, but you can influence the duration of the data redistribution stage. To do this, before you run the procedure, configure the settings that control the cluster's behavior at this stage. By combining settings, you can find the right balance between the speed of data redistribution and the speed of processing user requests.

As the data redistribution stage can potentially take a long time, there are tools for Managed Service for Greenplum® clusters to monitor the data redistribution process. Use these tools while cluster expansion is ongoing to get more accurate information about its progress and be able to estimate its completion time.

Settings affecting data redistribution processSettings affecting data redistribution process

The following settings are available:

  • Block cluster from load Management console CLI API

    If this setting is enabled (true), you cannot connect to the cluster and it does not receive new user requests. As a result, cluster expansion will run faster because you do not have to wait for releasing of locks that would otherwise be captured by incoming user requests.

    Warning

    If you close the cluster from load and disable background data redistribution, you will lose access to the cluster until its expansion is complete.

    The expansion process can be time-consuming.

  • Background data redistribution Management console CLI API

    This setting affects the data redistribution strategy:

    • If the setting is disabled (false), data redistribution will start as soon as the cluster expansion preparation stage is over.

      The cluster will remain in the Updating status until all cluster expansion stages are completed.

      The data redistribution process will be run once and will continue either until all the cluster's tables are redistributed or until the timeout expires.

      If data redistribution terminates upon reaching the timeout with some tables remaining unprocessed, you will need to redistribute those tables manually.

    • If the setting is enabled (true), data redistribution will be delayed.

      The cluster will remain in the Updating status only during preparation for cluster expansion.

      The data redistribution process will be run on a schedule during routine maintenance operations until all tables are processed.

      When background data redistribution is enabled, routine maintenance operations are performed according to the following algorithm:

      1. Custom table vacuuming (VACUUM).

      2. Data redistribution (REDISTRIBUTE):

        1. If all tables were processed before the timeout expired, the data redistribution process will be removed from the routine maintenance schedule and will not be started again.
        2. If only some of the tables were processed before the timeout expired, the process will be restarted during the next routine maintenance, and table processing will continue.
      3. Collecting statistics (ANALYZE).

  • Redistribution timeout Management console CLI API

    Timeout (in seconds) after which the data redistribution process will be interrupted.

    Reaching the timeout does not stop the process immediately: data redistribution will be completed for the tables currently being processed (IN PROGRESS status). You can request the status of the tables if the cluster is not closed from load.

    The minimum value is 0. The timeout will be calculated automatically depending on the cluster configuration and data size.

    The maximum value depends on whether background data redistribution is enabled:

    • If enabled, the maximum value is 28800 (eight hours).
    • If disabled, the maximum value is not limited.
  • Number of redistricting streams Management console CLI API

    Number of threads that will be started during the data redistribution process.

    Using more threads will speed up data redistribution but it will also increase the cluster load.

    The minimum value (default) is 0 (the number will be calculated automatically depending on the cluster configuration and data size), and the maximum one is 25.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.

Was the article helpful?

Previous
Managing connections
Next
Maintenance
Yandex project
© 2025 Yandex.Cloud LLC